7 Why Use Perl for CGI? Perl is the de facto standard for CGI programming for a number of reasons, but perhaps the most important are: Socket Support: Perl makes it easy to create programs that interface seamlessly with Internet protocols. Your CGI program can send a Web page in response to a transaction and send a series of e-mail messages to inform interested people that the transaction happened. Pattern Matching: Perl s regular expression support makes it ideal for handling form data and searching text. Flexible Text Handling: The way that Perl handles strings, in terms of memory allocation and de-allocation, fades into the background as you program. You simply can ignore the details of concatenating, copying, and creating new strings. There are some times when a mature CGI application should be ported to C or another compiled language. These are the Web applications where speed is important. If you expect to have a very active site, you probably want to move to a compiled language because they run faster. 7.5 CGI Apps versus Java Applets CGI and Java are two totally different animals. CGI is a specification that can be used by any programming language. CGI applications are run on a Web server. Java is a programming language that is run on the client side. CGI applications should be designed to take advantage of the centralized nature of a Web server. They are great for searching databases, processing HTML form data, and other applications that require limited interaction with a user. Java applications are good when you need a high degree of interaction with users: for example, games or animation. Java programs need to be kept relatively small because they are transmitted through the Internet to the client. CGI applications, on the other hand, can be as large as needed because they reside and are executed on the Web server. You can design your Web site to use both Java and CGI applications. For example, you might want to use Java on the client side to do field validation when collecting information on a form. Then once the input has been validated, the Java application can send the information to a CGI application on the Web server where the database resides.
7.6 Should You Use CGI Modules? There are a number of functions available in modules for CGI. Rather than reinventing the wheel, use the CGI modules that are available on the Internet at http://search.cpan.org/modlist/world_wide_web/cgi. CGI::Lite will be enough for general purposes. This is already installed through Active Perl installed on the lab machines. We will be using the CGI module CGI.pm which will be included using the line: use CGI qw(:standard); at the beginning of the Perl script. 7.7 Some Raw Details of CGI Programming in Perl 7.7.1 CGI Script Output A CGI script is programmed so that it MUST send information back to the browser in the following format: The Output Header A Blank Line The Output Data 7.7.1.1 CGI Output Header A browser can accept input in a variety of forms. Depending on the specified form it will call different mechanisms to display the data. The output header of a CGI script must specify an output type to tell the server and eventually browser how to proceed with the rest of the CGI output. There are 3 forms of Header Type: Content-Type Location Status Content-Type is the most popular type. We now consider this further. We will meet the other types later. NOTE: Between the Header and Data there MUST be a blank line. 7.7.1.1.1 Content-Types The following are common formats/content-types (there are others): Format Content-Type HTML Text text/html text/plain
Gif JPEG Postscript MPEG image/gif image/jpeg application/postscript video/mpeg To declare the Content-Type your CGI script must output: Content-Type: content-type specification Typically the Content-Type will be declared to produce HTML. So the first line of our CGI script will look this: Content-Type: text/html Depending on the Content-Type defined, the data that follows the header declaration will vary. If it is HTML that follows then the CGI script must output standard HTML syntax. Thus to produce a Web page that sends a simple line of text "Hello World!" to a browser a CGI script must output: Content-Type: text/html <html> <head> <title>hello, world!</title> </head> <body> <h1>hello, world!</h1> </body> </html> 7.7.2 A First Perl CGI Script without CGI module help Every Perl program MUST obey the following format: A first line consisting of: #!/usr/bin/perl The rest of the program consisting of legal Perl syntax and commands. For CGI the Perl output must be in HTML -- this is where Perl is really handy. Strictly speaking the first line is only required for running Perl programs on UNIX machines. Since windows does not care about this line and the intended
destination of a lot of Perl scripts is a UNIX/Linux machine it is a good idea to make this the first line of every perl program. To output from a Perl script you use the print statement: The first line of our CGI script must be `` Content-Type: text/html'' and the print statement must have 2 \n characters: One to terminate the current line, and the second to produce the required blank line between CGI header and data. print "Content-Type: text/html\n\n"; A complete first (hello.plx) program is a follows: #!/usr/bin/perl # hello.pl - My first CGI program print "Content-Type: text/html\n\n"; # Note there is a newline printed between # this header and Data # Simple HTML code follows print "<html> <head>\n"; print "<title>hello, world!</title>"; print "</head>\n"; print "<body>\n"; print "<h1>hello, world!</h1>\n"; print "</body> </html>\n"; 7.7.3 HTTP Headers The first line of output for most CGI programs must be an HTTP header that tells the client Web browser what type of output it is sending back via STDOUT. For example, the Location header is used to redirect the client Web browser to another Web page. For example, let's say that your CGI script is designed to randomly choose from among 10 different URLs in order to determine the next Web page to display. Once the new Web page is chosen, your program outputs it like this: print("location: $nextpage\n\n"); Once the Location header has been printed, nothing else should be printed. That is all the information that the client Web browser needs.
7.7.4 URL Encoding One of the limitations that the WWW organizations have placed on the HTTP protocol is that the content of the commands, responses, and data that are passed between client and server should be clearly defined. It is sometimes difficult to tell simply from the context whether a space character is a field delimiter or an actual space character to add whitespace between two words. To clear up the ambiguity, the URL encoding scheme was created. Any spaces are converted into plus (+) signs to avoid semantic ambiguities. In addition, special characters or 8-bit values are converted into their hexadecimal equivalents and prefaced with a percent sign (%). For example, the string Dave Marshall <dave@cs.cf.ac.uk> 1 is encoded as Dave+Marhsall+%3Cdave@cs.cf.ac.uk%3E. If you look closely, you see that the < character has been converted to %3C and the > character has been coverted to %3E. Your CGI script will need to be able to convert URL encoded information back into its normal form. Fortunately, The cgidecode.pl contains a function that will convert URL encoded. This perl program: Defines the decodeurl() function. Gets the encoded string from the parameter array. Translates all plus signs into spaces. Converts character coded as hexadecimal digits into regular characters. Returns the decoded string. The Perl for cgidecode.pl is: sub decodeurl { $_ = shift; tr/+/ /; s/%(..)/pack('c', hex($1))/eg; return($_); } This function will be used in later to decode form information. It is presented here because canned queries also use URL encoding. 1 Dave is the author of the original of these notes
7.8 Security CGI has a number of security weaknesses. For example, if you pass information that came from a remote site to an operating system command, you are asking for trouble. For now it is sufficient to note that you should be careful to check out these issues. Check out CGIwrap (http://wwwcgi.umr.edu/cgiwrap/) to find out what to do. 7.8.1 An example Suppose that you had a CGI script that formatted a directory listing and generated a Web page that let visitors view the listing. In addition, let's say that the name of the directory to display was passed to your program using the PATH_INFO environment variable. The following URL could be used to call your program: http://www.foo.com/cgi-bin/dirlist.pl/docs Inside your program, the PATH_INFO environment variable is set to docs. In order to get the directory listing, all that is needed is a call to the ls command in UNIX or the dir command in DOS. Everything looks good, right? But what if the program was invoked with this command line? http://www.foo.com/cgi-bin/dirlist.pl/; rm -fr; Now, all of a sudden, you are faced with the possibility of files being deleted because the semi-colon (;) lets multiple commands be executed on one command line.