Computer Programming in Perl: Internet and Text Processing Instructor: Dmitriy Genzel June 28 July 16 1 Course Description This course will teach you how to program in Perl, the programming language behind much of the web functionality you use daily. The beginning of the course will be spent in learning the language, and the rest of it in applying it to various tasks both offline and online, with a special focus on the web tasks and text processing. Possible projects include developing web forums, search engines, programs that can execute other programs, programs that appear to be intelligent (e.g., chatbots), network and graphical applications (perhaps games). You will choose some of the projects yourself. The purpose of this class is to show you what programming is all about (namely, having fun), what can be done in Perl, how to do it, and how to learn more about Perl on your own. The course also provides introduction to computer science. Previous programming experience is recommended. An ability to think logically, solve problems, and learn fast is fundamental. This course is for those seriously interested in learning programming and Perl; this is not a gentle introduction to computers. Enrollment limited to 20 students. 2 Goals and Objectives The goals of the course are: 1. To serve as an introduction to programming, Perl, and Computer Science 2. To explore in depth the topics of interest to you 3. To make you interested in exploring programming on your own By the end of the course, you should (in the order of decreasing importance): 1. Be able to program: Given a task, know how to perform it 2. Know some Perl: Be able to read Perl code 3. Know about various Perl resources (primarily Perl modules at CPAN): Be able use them on your own 4. Have a general notion of Computer Science and its subfields: Be able to name at least five fields and name important problems they deal with 5. Know a little about software engineering: Be able to write a 1000-line program you can read six months from now 6. Know about other programming languages and how they compare to Perl: Know when to use or not to use Perl 7. Acquire some non-programming skills relevant to learning CS: Be able to write and present in a comprehensible way 1
3 Learning Format There will be two kinds of sessions: lectures and supervised labs. You will also attend labs on your own to do your assignments. The lectures will be used to present course materials useful to you and also for your presentations; labs will be used to help you with your home assignments/projects and to get some programming experience under instructor s eye. In addition, I will be in the lab during my office hours while you do your homework. Some of the lectures are designated as Special topics which means that during these lectures I want to cover topics of interest to you. Please let me know what you want me to cover as soon as possible. If you offer no suggestions, I will cover topics given in parenthesis after the lecture name. See the schedule for details. I will hold office hours for three hours after the afternoon section (2:30-5:30pm). 4 Readings One does not learn how to program (or even learn a particular programming language) by reading about it. However, it is a good idea to read about something before you try it, even if your inclination is to jump right in. Also, you may prefer to use the book to look something up, even though real hackers use online documentation for this. Most books are (also) available online for those on Brown campus. See course webpage or Brown library for information. We will use the following readings: Required: Learning Perl by Randal L. Schwartz and Tom Phoenix, 3rd edition, known as the Llama book. Buy at the bookstore or use online (not recommended). Required: Perl manpages. Available on your system. Type perldoc perl at command prompt. Recommended: Programming Perl by Larry Wall, Tom Christiansen, and Jon Orwant, 3rd edition, known as the Camel book. Buy at the bookstore if plan to use Perl outside of class or use online while at Brown. Recommended: CPAN: Comprehensive Perl Archive Network. Online, at http://www.cpan.org/ Useful: Perl Cookbook by Tom Christiansen and Nathan Torkington, known as the Cookbook. Find on reserve in the library or use online. Useful: Mastering Perl/Tk by Stephen Lidie and Nancy Walsh, not really known as anything other than the title. Use online. You are expected to read Learning Perl as we go along, but it is not required if you are very confident that you don t need it. The material in the book is often complementary to the lectures and you will find it very useful. Do not be scared if you are asked to read three chapters in a day. Simply skim them, I will not cover all the material there. Programming Perl is the Perl Bible and should be used as a primary reference if you prefer a dead-tree form, rather than the digital one. For digital documentation use man/perldoc pages. Perl Cookbook is a collection of recipes for common tasks. If you have a task that seems common to you, like sorting a list, opening a socket, or listing a directory, you will find a code snippet to do it there. Make sure you understand how it works before you use it, though! Mastering Perl/Tk is a book about Tk, a GUI library for Perl. We will use it occasionally in class, and it will be very useful if you decide to do a GUI final project. CPAN should be used for module documentation (although you should first check man pages and HTML documentation on your machine). I will also make the lectures available online immediately following the class, so you can consult them. This does not mean, however, that you can skip the classes, there s more to the class than just the slides. Many of the classes won t be lectures anyway. I will check attendance. 5 Assignments: General The only way to learn to program and to learn a new programming language is to actually write programs. Therefore, the primary kind of assignments will be programming projects. They will gradually increase in difficulty, culminating in a final project (chosen by you) which will be of a significant complexity. Please see 2
the appendix for some suggested projects. One of the major points of this course is for you to have fun while programming, and to create a major piece of software which is worth being proud of. There will also be a few non-programming assignments. The purpose of these is to provide some breadth to the your experience. Whether you like it or not (I don t), computer scientists and programmers need to be able to write and present clearly. You will write one short paper and give at least one presentation and a final project demo. The paper will involve some research. There will be no group projects (except possibly the final one, if you convince me). This means that the work you submit should be your own. You are welcome to talk to other students and ask their advice, but please don t copy their code. There will also be discussion questions due the next day. I will ask one or two of you to discuss with me or another student the question I assigned. I will ask for volunteers first, but everyone will go through it, so it is in your interest to volunteer if you have something to say on the topic. All assignments will be due at 10am on the due date. There will be no tests. The evaluation will be based on your assignments. I will provide a numeric grade based on the following (for programming assignments): 1. The program produces no syntactic or other Perl error for any user input 2. The program solves the problem 3. The solution is the most efficient possible 4. The program is written in a good style, easy to read 5. Significant effort was made or an improvement in quality (compared to previous assignments) was accomplished. The final evaluation (there is no grade) will be based on (in the order of decreasing importance): 1. Homework (programming) 2. Final project 3. Attendance 4. Non-programming assignments More weight will be given to the later homeworks. 6 Doing Homework Normally lectures won t take the whole class. This is intentional, since you learn by doing and lecturing takes the time away from that. So when the lecture part is finished we will automatically turn into a lab mode and you will start doing your homework. This is why there would be a lot of homework. I hope that even the brightest among you would not be able to finish all of it before they run out of time. I expect you to do as much homework as you are able, and I expect your abilities to increase very fast. You would mostly be doing homework during the office hour period, since this is when the lab is open and I am around to answer questions. 7 Assignment Listing All homework will involve book exercises for the chapter(s) we covered that day. In addition, the following problems will be included (many are optional): Short programming assignments (due next day): S1. Taxes, ASCII graphics S2. Instant run-off voting S3. HTML, text processing S4. CGI comments form S5. Regular expressions 3
S6. References: trees S7. Simple calculator Medium programming assignments (due in two days): For the following assignments you are expected to submit a status report (how far along you are, etc.,) or the actual things you got to work the day after the assignment is distributed. The assignment will provide details. M1. A chat bot. M2. Paint program M3. Choose one of the following: GUI: A Calendar application Web: A web forum Net: Mirroring software Final project-related assignments: F1. Preliminary proposal. Short description of proposed project. May be revised until F2 is submitted. F2. Detailed proposal. Includes list of features to be implemented (basic and optional). Needs to be approved before the work is started. F3. Early status check. Short report on what s done so far. Request for change in functionality. F4. Mid-project deadline. Submit code. 3/4 of basic features should be implemented. This is followed by meeting with the customer to discuss. F5. Final status update. Short report on what s working and what s not. 95% of basic features should be implemented F6. Final demo. 15 minute presentation in front of the class. Non-programming assignments: N1. Write a short research paper (3-4 pages) on a general topic related to CS. The topic will be assigned to you, but you will have some latitude to change it if you really hate it. I may ask you to present this (instead of N2) if it is especially good or bad. N2. Prepare a presentation on some technical problem you faced and how you solved it. For example, describe how to use some CPAN module to accomplish a particular task. This may be waived for some people on the basis of N1 (if they presented). All presentations should be no more than 15 minutes (including time for questions). 4
8 Schedule [tentative] In the Date column (m) means morning section, (a) means afternoon section. Items in the Out and Due columns refer to assignment numbers. Items in the Read column refer to chapters in Learning Perl which are covered by that class. Date Description Read Out Due Week 1 June 28 (m) Course goals, syllabus, introduction to Perl (a) Scalar data Ch. 2 S1 June 29 (m) Lists and arrays Ch. 3 S2 S1 (a) Subroutines Ch. 4 F1 June 30 (m) Hashtables, Input/Output Ch. 5-6 S3 S2 (a) HTML N1 July 1 (m) Files and directories Ch. 11-13 S4 S3 (a) Modules, basic CGI July 2 (m) Regular expressions Ch. 7 S5 S4 (a) Regular expressions (cont.) Ch. 8-9 M1 July 3 (m) References; Final project showcase S6 S5, N1-src (a) no class Week 2 July 6 (m) More control structures, strings, sorting Ch. 10, 15 S7 M1, S6 (a) Perl/TK F2 F1 July 7 (m) Perl/TK (cont.) M2 S7 (a) Lab N1 July 8 (m) Simple databases; DBI, DBM Ch. 16 F2 (a) Lab July 9 (m) Internet tools, processes, advanced topics Ch. 14, 17 M3 M2 (a) Presentations for assignment N1 Week 3 July 12 (m) Special topics (Databases) F3, N2 M3 (a) Lab: special topics July 13 (m) Special topics (Perl objects) F4 F3 (a) Lab: special topics July 14 (m) FP: meeting with the customer F5 F4 (a) FP: meeting with the customer N2 topic July 15 (m) Lab: help with final project F6 F5 (a) Presentation of N2 N2 July 16 (m) Demo for final projects F6 (a) Demo for final projects July 3 (Saturday) class is optional, but recomended 9 Contact Information Dmitriy Genzel Phone (office): 401-863-7672 Email: dg@cs.brown.edu Office: CIT Room 551 Address: Box 1910, Brown University, Providence, RI 02912 Class webpage: http://www.cs.brown.edu/ dg/summer04/ TA: Haruyoshi Sakai, hsakai@cs.brown.edu (see potential projects on the next page) 5
Appendix: The list of potential projects Some ideas for possible projects: Obvious: An extension of any earlier project (including N1) A web forum (bulletin board) A blog A simple game (e.g., tetris, puzzle, Life) A chat client (GUI or non-gui) A calendar GUI application A music player A music collection organizer A chess program that lets two people play Less obvious: Web/Internet A search engine A web proxy (anonimyzer, ad blocker, etc) A web server A P2P secret chat network Less obvious: Algorithms: Some image manipulation task (e.g. find borders, etc) Some crypto task (e.g. substitution ciphers, http://sicp.ai.mit.edu/fall-2002/) Spam filter: perceptron or something else LZ Compression Implementing some paper in NLP Text processing, concordances, like http://www.opensourceshakespeare.org/ Word (or sentence) alignment for machine translation Interpreter for LOGO Less obvious: Simulation: Simulated societies (sugarscape) Looking up at the stars Physics simulation (gravity (solar system), any force (field lines)) 6