Introduction to Perl Programming

Similar documents

Introduction to Perl Programming: Summary of exercises

Python Lists and Loops

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Visual Logic Instructions and Assignments

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Introduction to Python

Exercise 4 Learning Python language fundamentals

Informatica e Sistemi in Tempo Reale

PHP Tutorial From beginner to master

PL / SQL Basics. Chapter 3

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0

Object Oriented Software Design

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Hypercosm. Studio.

Python Loops and String Manipulation

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

PHP Debugging. Draft: March 19, Christopher Vickery

Object Oriented Software Design

Windows PowerShell Essentials

Learn Perl by Example - Perl Handbook for Beginners - Basics of Perl Scripting Language

Moving from CS 61A Scheme to CS 61B Java

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

VHDL Test Bench Tutorial

Code::Blocks Student Manual

Hands-on Exercise 1: VBA Coding Basics

PGR Computing Programming Skills

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Eventia Log Parsing Editor 1.0 Administration Guide

University of Hull Department of Computer Science. Wrestling with Python Week 01 Playing with Python

Lab 9 Access PreLab Copy the prelab folder, Lab09 PreLab9_Access_intro

Computer Science for San Francisco Youth

Hands-On UNIX Exercise:

Exercise 1: Python Language Basics

Pseudo code Tutorial and Exercises Teacher s Version

Access Queries (Office 2003)

6.170 Tutorial 3 - Ruby Basics

6. Control Structures

Creating and Using Databases with Microsoft Access

Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide

Programming in Access VBA

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

AppendixA1A1. Java Language Coding Guidelines. A1.1 Introduction

CS 1133, LAB 2: FUNCTIONS AND TESTING

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Home Loan Manager Pro 7.1

Samsung Xchange for Mac User Guide. Winter 2013 v2.3

Programming LEGO NXT Robots using NXC

Part 1 Foundations of object orientation

#!/usr/bin/perl use strict; use warnings; use Carp; use Data::Dumper; use Tie::IxHash; use Gschem 3; 3. Setup and initialize the global variables.

Visualizing molecular simulations

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Command Line - Part 1

Microsoft Access 3: Understanding and Creating Queries

Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

The programming language C. sws1 1

Sources: On the Web: Slides will be available on:

Install Java Development Kit (JDK) 1.8

I PUC - Computer Science. Practical s Syllabus. Contents

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

Flowchart Techniques

Databases in Microsoft Access David M. Marcovitz, Ph.D.

Microsoft Outlook. KNOW HOW: Outlook. Using. Guide for using , Contacts, Personal Distribution Lists, Signatures and Archives

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

VB.NET Programming Fundamentals

A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles

How To Use Query Console

Computer Programming In QBasic

Microsoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets

PL/SQL Overview. Basic Structure and Syntax of PL/SQL

Installing AWStats on IIS 6.0 (Including IIS 5.1) - Revision 3.0

Memory Systems. Static Random Access Memory (SRAM) Cell

Introduction to Shell Programming

Microsoft Access 2010 Part 1: Introduction to Access

Microsoft Excel Tips & Tricks

Training Module 4: Document Management

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v Steps to Developing a QNX Program Quickstart Guide

DataPA OpenAnalytics End User Training

Blender Notes. Introduction to Digital Modelling and Animation in Design Blender Tutorial - week 9 The Game Engine

ISSH 2011 ABM Track - Hands-on Exercise

WS_FTP Professional 12

Configuration Manager

AN INTRODUCTION TO UNIX

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX

Lecture 5: Java Fundamentals III

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Jonathan Worthington Scarborough Linux User Group

7 Why Use Perl for CGI?

Import Filter Editor User s Guide

Animated Lighting Software Overview

The Real Challenges of Configuration Management

MICROSOFT ACCESS STEP BY STEP GUIDE

Transcription:

Introduction to Perl Programming Oxford University Computing Services

2

Revision Information Version Date Author Changes made 1.0 09 Feb 2011 Christopher Yau Created 1.1 18 Apr 2011 Christopher Yau Minor revisions and corrections 1.2 20 Jun 2011 Christopher Yau Added section on Regular Expressions 1.3 2 Dec 2011 Christopher Yau Minor corrections (hand over to Sebastian Kelm) 1.4 10 May 2012 Sebastian Kelm Minor corrections 1.5 17 Feb 2013 Sebastian Kelm Removed section on Switch statements Hand over to Thaddeus Aid 1.6 17 Apr 2013 Thaddeus Aid Added 2d arrays, minor corrections Acknowledgements With thanks to Alistair Wire, Rehan Ali and Jeremie Becker for careful reading of the drafts and original document and Susan Hutchinson for the use of her introduction notes to Linux. Copyright Notice The copyright of the remainder of this document and its associated resources lies with Oxford University IT Services.

Contents 1 Introduction to Perl 13 1.1 A First Perl Program....................... 14 1.2 Handling numbers and strings.................. 16 1.2.1 Numbers.......................... 16 1.2.2 Strings........................... 18 1.2.3 Reading from user input................. 20 1.3 Arrays............................... 21 1.3.1 Creating and manipulating arrays............ 21 1.3.2 Array functions...................... 23 1.3.3 Sorting arrays....................... 25 1.3.4 Two dimensional arrays................. 27 2 Program Control 29 2.1 If-else statements......................... 29 2.2 Nested and Compound conditionals............... 32 2.3 Loops............................... 35 2.3.1 While loops........................ 35 5

6 CONTENTS 2.3.2 Foreach loops....................... 36 2.3.3 For loops......................... 37 3 File Handling 39 3.1 File Handles............................ 39 3.2 Closing a filehandle........................ 40 3.3 Error Checking.......................... 41 3.4 Reading files........................... 42 3.5 Writing to files.......................... 45 3.6 Writing to multiple files..................... 46 3.7 Tying files to an array...................... 49 3.8 File System Operations..................... 49 3.8.1 Changing directory.................... 49 3.8.2 Deleting files....................... 49 3.8.3 Listing files........................ 50 3.8.4 Testing files........................ 51 4 Regular Expressions 53 4.1 Basic string comparisons..................... 53 4.2 Using Wildcards and Repetitions................ 54 4.3 Groups............................... 56 4.4 Character Classes......................... 56 4.5 Putting it All Together...................... 57 4.6 Other string operations..................... 58

CONTENTS 7 4.6.1 Substitutions....................... 58 4.6.2 Translations........................ 59 5 Hash Tables 61 5.1 Creating a Hash......................... 62 5.2 Testing for keys in a hash.................... 62 5.3 Retrieving keys and values from a hash............. 64 5.4 Frequency tables using hashes.................. 65 5.5 Using records........................... 68 6 Sub-routines 71 6.1 Sub-routines............................ 71 6.1.1 Local and global scoping................. 73 6.1.2 Passing Scalar Arguments................ 75 6.1.3 Passing Arrays and Hashes as Arguments....... 78 A Getting Started with Linux 85 B Further Exercises 91

8 CONTENTS

Getting Started In this course we will be using a live version of the Ubuntu Linux operating system. This means that the operating system is being run from a USB drive and comes with a default Linux environment and preconfigured settings ready to go. However, before we start we may need to make a few local modifications in order to set up the keyboard correctly. If you have never used Linux or the Linux terminal/command line interface before then a quick primer can also be found in the Appendix. These notes are borrowed from the IT Services Introduction to Linux" course developed by Susan Hutchinson. A digital version of these notes, as well as a few data files, can be found on the course website. You will be referred to this URL whenever you need to obtain extra files to complete an exercise: http://www.stats.ox.ac.uk/~aid/perl/ If you cannot complete all the exercises during the allotted time, you may wish to finish them at home. If you run into trouble you cannot solve by yourself, you can contact the course teacher by email. Please include the words IT Services Perl in the subject line. aid@stats.ox.ac.uk 9

Preparation: Changing the keyboard layout The keyboard is currently set up with the US style layout. We need to configure this for our UK keyboards. This is only necessary on Live Linux systems. When using an installed version of Linux the keyboard layout will be configured at installation time. 1. Open System > Preferences > Keyboard 2. Select Layouts

3. Click on Add and select United Kingdom from the Country: list on the top left hand side. 4. Click on Add. 5. Make sure that Generic 104 key PC is selected in the Keyboard model field. 6. Now remove the USA and highlight United Kingdom and click on Close. This will need to be done at the beginning of every session when running Live Ubuntu but is not normally necessary when using a desktop installation of Linux.

12 CONTENTS

Chapter 1 Introduction to Perl Perl is a programming language originally developed by Larry Wall in 1987 as a Unix-based scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular amongst programmers. Perl is a stable language, the current version Perl 5 has been in use since 1994. A major revision, Perl 6, is currently in development. Although this version will introduce some fundamental changes, it should remain nonetheless sufficiently Perl-like that many Perl programmers will never notice the difference. Perl is sometimes nicknamed the Swiss Army chainsaw of programming languages" due to its flexibility and adaptability and there are a number of general features of Perl that make it a particularly worthy of this description: High-level Perl uses strong abstraction which hides many of the physical and low-level systems architecture of the computer away from the user. This means that once we have written a Perl program, it should function identically on any computer running the same version of Perl. Perl also uses natural language elements that means it is easy to read and interpret (as far as a programming language can be!). Interpreted Perl programs are executed directly from the source file by a piece of software known as the interpreter. There is no need to compile and generate an executable binary file (e.g. an.exe file). The source file containing Perl code is translated into an efficient intermediate 13

14 CHAPTER 1. INTRODUCTION TO PERL representation which is then immediately executed by the interpreter software. Dynamic Perl executes at run-time several behaviours that other lowerlevel languages might perform during compilation. This means that, for instance, it is unnecessary in Perl to pre-define the size of arrays. Perl borrows features from other programming languages in particularly C and Unix shell scripting languages such as sh, awk, and sed. The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix/Linux tools, facilitating easy manipulation of text files. It is also used for graphics programming, system administration, network programming, applications that require database access and CGI programming on the Web. This booklet presents a brief introduction to Perl programming utilising many guided and self-learning exercises. It is important to note that there are often many ways to write a Perl script to perform a task there is rarely a single correct way. In the examples and solutions shown in this booklet, the coding style has been done to emphasise simplicity or promote understanding rather than computational efficiency or coding brevity. If you are able to master the material in this booklet, you will have established a solid foundation for learning further Perl programming. If you want further exercise, a few examples can be found in the Appendix to stretch your mind. A number of texts are available for supporting further Perl programming education and in addition there are many free resources available on the Internet. Enjoy! 1.1 A First Perl Program Traditionally, the first program that one writes when learning a new computer programming language is one which prints Hello World" on the screen, and then exits. It is surprising how much you can learn about a language just from being able to do this. In the following exercise, you will create a Perl script called first.pl.

1.1. A FIRST PERL PROGRAM 15 Perl programs do not have to be named with a.pl extension but you will need to name them like this for text editors to recognise that they are Perl scripts and correctly highlight keywords. It is also useful to keep this convention so you can more easily catalogue your files by looking at the file extension. EXERCISE 1: Hello World 1. Open a text editor (e.g. gedit). 2. Start a new text document. 3. Enter the source code for the Hello World program below: #!/usr/bin/perl use strict; print "Hello World\n"; # prints "Hello World" to the screen 4. Save the file as first.pl. 5. Open a terminal window and change to the directory containing first.pl using the shell command cd. 6. Run the Hello World program by using the command perl first.pl. You should see the following output: > perl first.pl Hello World The first line is a special comment. On Unix/Linux systems, the very first two characters on the first line of a text file are #!, followed by the name of the program that executes the rest of the file. In this case, the program is stored in the file #!/usr/bin/perl. The second line contains a pragma which is an instruction to the Perl interpreter to do something special when it runs your program. In this case, the command use strict does two things

16 CHAPTER 1. INTRODUCTION TO PERL that make it harder to write bad software: (i) it requires that you declare all your variables and, later on, we shall see it makes it harder for Perl to mistake your intentions when you are using sub-routines (Chapter 6). The third line prints the words Hello World" to the screen followed by a new line specified by the \n. The script is executable by running the command perl in a terminal followed by the name of the script. This starts the Perl interpreter which immediately begins to execute the script whose name was supplied as an argument. Note, the hash symbol # is used in Perl to denote comments; lines that will not be interpreted as code. Adding comments throughout your code is useful for reminding yourself and others what different parts of a program are meant to be doing. 1.2 Handling numbers and strings In Perl, single items of data (e.g. a number or a string) are stored using a scalar variable. A scalar variable can be created by prefixing a $ in front of the variable name, e.g. $x, $car, $my_house, etc. A value can be assigned to a variable using the assignment operator (=), e.g. $x = 1. In these exercises, since we are utilising the use strict functionality of Perl, it also necessary to use the keyword my in the declaration of a variable before its first use, e.g. my $x = 1. 1.2.1 Numbers Perl contains a standard complement of numerical operators which can be applied to scalar variables containing numbers: =... assignment, e.g. $x = 10 assigns the value 10 to the variable $x +... addition, e.g. $x + $y gives the sum of $x and $y *... multiplication, e.g. $x * $y gives the product of $x and $y -... subtraction, e.g. $x - $y gives the difference between $x and $y

1.2. HANDLING NUMBERS AND STRINGS 17 /... division, e.g. $x/$y divides $x by $y %... modulo, computes the division remainder, e.g. 15 % 10 gives 5 **... exponentiation, e.g. 2**3 gives 8 EXERCISE 2: Numbers 1. Create a new text document and name it addnumbers.pl. 2. Enter the following code: #!/usr/bin/perl use strict; my $x = 1; # assign the value 1 to variable $x my $y = 3; # assign the value 1 to variable $y my $z = $x + $y; # assign the sum of $x and $y to variable $z print "The first number is: $x\n"; # print the value of $x print "The second number is: $y\n"; # print the value of $y print "The sum is: $z\n"; # print the value of $z 3. Run the program in the terminal. You should see the following output: > perl addnumbers.pl The first number is: 1 The second number is: 3 The sum is: 4 4. Extend the program to calculate and print the difference, product and quotient of $x and $y. In this program, the numbers 1 and 3 are stored in the variables $x and $y respectively and the sum of the two variables is stored in $z before the

18 CHAPTER 1. INTRODUCTION TO PERL results are printed on screen. The value of the variables can be displayed on screen using the print command. 1.2.2 Strings Strings can also be assigned to variables in the same way by encasing strings of characters with single ( ) or double (") quotes. Single quotes specify that the contents between the quotes should be printed literally whilst double quotes means variables contained betweeen the quotes should be replaced by their values. The following exercise illustrates the difference between the two modes. EXERCISE 3: Strings 1. Create a new text document and name it addstrings.pl. 2. Enter the source code below: #!/usr/bin/perl use strict; my $x = "Jim"; my $y = "Hendrix"; print "My name is $x $y\n"; print 'My name is $x $y\n'; 3. Switch to the terminal and run the program. > perl addstrings.pl My name is Jim Hendrix. My name is $x $y\n 4. Create a new variable to store the string is a legend and print out all three strings.

1.2. HANDLING NUMBERS AND STRINGS 19 In this script, we have used variable interpolation in the first print statement. When print is called by the Perl interpreter, the variables $x and $y are replaced by their contents before being printed. The final print statement uses a literal and the characters $x and $y are printed literally. Perl contains a number of operators and functions for string manipulation, we introduce a few here: 1. (.) - the dot operator can be used for string concatenation (joining two strings together), e.g. $x = "Hello"; $y = "World"; $z = $x. " ". $y; # $z contains "Hello World" 2. length - the length function can be used to find the length of a strings $x = "Hello"; $length = length($x); # $length contains 5 3. uc and lc - the upper/lower case functions can be used to convert a string to upper/lower case respectively $x = "Hello"; $upper = uc($x); # $upper contains "HELLO" $lower = lc($x); # $lower contains "hello" 4. reverse - the reverse function can be used to reverse the order of characters of the string $x = "Hello"; $xrev = reverse($x); # $xrev contains "olleh" 5. substr - to extract a piece of a string, e.g. substr($x, 2, 5) where $x is the string, 2 specifies the start character and 5 specifies the length of the sub-string to extract. $x = "ambulance"; $y = substr($x, 2, 5); print $y. "\n"; # prints "bulan"

20 CHAPTER 1. INTRODUCTION TO PERL should return the string bulan. The function substr can also be used to replace a piece of a string, e.g. substr($x, 2, 5, $y) where $x is the string, 2 specifies the start character, 5 specifies the length of the sub-string to extract and $y contains the piece of text to be inserted. For example: $x = "jelly baby"; substr($x, 6, 4, "babe"); # $x contains "jelly babe" should return the string jelly babe. 6. $variable =~ s/pattern to search for in $variable/what to insert/ - substitutions can be made in strings using a regular expression $x = "jelly baby"; $x =~ s/baby/babe/; # $x contains "jelly babe" 1.2.3 Reading from user input A really useful facility is the ability to read in information from the console. This allows you to produce interactive programs which can ask the user questions and receive answers. To achieve this we can use what is known as the STDIN filehandle (more specifically on filehandles later). You usually only read one line at a time from STDIN (so the input stops when the user presses return). print "What is your name? "; my $name = <STDIN>; chomp($name); # chomp removes the return character that you entered print "What is your quest? "; my $quest = <STDIN>; chomp($quest); print "What is your favourite colour? "; my $colour = <STDIN>; chomp($colour);

1.3. ARRAYS 21 EXERCISE 4: 1. Write a Perl script to read in a string from the console and print: (a) The length of the string (b) The reverse of the string (c) the upper and lower case version of the string 2. Modify your script to accept two string inputs and prints the concatenation of the two strings separated by a space. 1.3 Arrays Perl has three built-in data types: scalars (numbers and strings), arrays and hashes. Hashes will be considered later in the text and we will focus on arrays for now. What are arrays? Simply, arrays are an ordered collection of scalars. Arrays therefore allow you to store scalars in an ordered manner and to retrieve them based on their position. Unlike many other programming languages, arrays in Perl are dynamic, which means that you can add or remove items from them at will. You do not need to specify in advance how big the array must be. 1.3.1 Creating and manipulating arrays Arrays in Perl are denoted by a leading @ symbol (e.g. @array), array elements are scalars and therefore start with a $, but in order to identify which element of the array you are referring to it is necessary to also specify their position in the array (known as their index) included in square brackets at the end of their name. The first element in an array has index 0 so, for example, the fifth element in @array would be $array[4]. The following code example illustrates the specification and use of an array:

22 CHAPTER 1. INTRODUCTION TO PERL my @array = ("cat", "dog", "rabbit", "turtle"); print "The first element of \@array is not $array[1], but $array [0]"; An array of strings is created called @array and the strings cat", dog", rabbit" and turtle" stored as part of the new array. The print displays the second item ($array[1] is dog) and the first item ($array[0] is cat). Another useful convenience is that you can use negative indices to count back from the end of an array: print "The last element of \@array is $array[-1]"; Perl provides a number of array functions to add or remove items from an array. The function push can be used to add items to the end of an array: my @array = ("cat", "dog", "rabbit", "turtle"); push(@array, ("badger", "fox") ); # @array contains cat, dog, rabbit, turtle, badger and fox print "@array\n"; The code sequence above gives the following output: cat dog rabbit turtle badger fox showing that the push function has added badger and fox to @array. The array functions shift and pop can be used to remove the first or last array elements respectively: my @array = ("cat", "dog", "rabbit", "turtle"); my $first = shift(@array); # @array contains dog, rabbit and turtle my $last = pop(@array); # @array contains dog and rabbit print "$first\n"; # prints "cat" print "$last\n"; # prints "turtle" print "@array\n"; # prints dog rabbit A slice is to an array what substring is to a scalar. It is a way to extract several values from array in one go without having to take the whole thing. The syntax of a slice is @array[list_of_indexes]. It returns a list of scalars

1.3. ARRAYS 23 which you can either assign to individual scalars or use it to initialise another array. my @array = ("cat", "dog", "rabbit", "turtle", "badger", "fox"); (my $two, my $three, my $five) = @array[2, 3, 5]; # extracts 3rd, 4th and 5th elements of @array my @last_two = @array[4, 5]; # extracts 5th and 6th elements of @array print "$two, $three, $five\n"; # prints "rabbit, turtle, fox" print "@last_two\n"; # prints "badger fox" When assigning to a list of scalars (as in $two, $three, $five) the values are mapped from the list returned by the slice onto the scalars in the list. This same technique can also be used to extract values from an array without changing it (as would happen if you used shift/pop): my @array = ("cat", "dog", "rabbit", "turtle", "badger", "fox"); (my $red, my $orange, my $yellow) = @array; # $red contains "cat ", $orange is "dog" and $yellow contains "rabbit" In this example the values are transferred to the scalars, but @array is left in tact. It does not matter that @array has more values than the list, the rest are just ignored. 1.3.2 Array functions One useful thing to be able to extract is the length of an array. There are two ways to get this. For every array there is a special variable associated with it which holds the highest index number contained in the array. As array indexes start at 0, this value is always one less than the length. The special variable is $#array_name. It is a good idea to get used to the notion of special variables in Perl as they crop up a lot as a shorthand for experienced programmers. They can produce sometimes unintelligible code if you are not aware that you are looking at a special variable: my @array = (1, 2, 3, 4, 5); print "The last index is ", $#array; # gives "The last index is 4"

24 CHAPTER 1. INTRODUCTION TO PERL Alternatively you can get the length of an array by using it in a situation where Perl expects to see a scalar. If you use an array as a scalar you get the length of the array. For example: my @array = (1, 2, 3, 4, 5); my $length = @array; print $length; # prints "5" is completely equivalent to: my @array = (1, 2, 3, 4, 5); print "The length of the array is ", scalar(@array); # prints " The length of the array is 5" As with scalars before there are a couple of functions which are only really useful in combination with arrays. The join function turns an array into a single scalar string and allows you to provide a delimiter which it will put between each array element. It is commonly used when outputting data to write it to a file as either comma separated or tab delimited text. my @array = ("tom", "dick", "harry"); print join("\t", @array); # Produces tab delimited text You can also go the other way and use the split function to split a single scalar into a series of values in an array. The split command actually uses a regular expression to decide where to split a string - do not worry about the details of this bit for the moment - we will come to these later, just think of it as a string in between two /" characters. my $scalar = "Hello.there.everyone"; my @array = split(/\./, $scalar); # @array contains "Hello", " there" and "everyone" print "Second element is ", $array[1],"\n"; # prints "Second element is there" print join(" ", @array), "\n"; # prints "Hello there everyone"

1.3. ARRAYS 25 1.3.3 Sorting arrays A common requirement having populated an array is to sort it before doing something with it. Sorting is actually a non-trivial task but most of the complexity and technicalities are rarely relevant in most common tasks. The function to sort an array is sort. The function uses a sorting rule which is, a small block of code to instruct it as to how to do a comparison between two objects, and the array containing the objects to be sorted. Sort does not alter the array that is passed to it but rather returns a new array consisting of the sorted list of the elements contained in the original array. Perl uses the special variable names $a and $b to define sorting rules and these variable names should be reserved for use in sort code blocks, they will work elsewhere, but it is considered bad practice to use them. my @array = ("C", "D", "B", "A"); @array = sort $a cmp $b @array; print join(" ", @array); # prints A B C D This code sorts the array alphabetically. The code block in the sort is the bit between the curly brackets. The block must contain a statement using $a and $b to say which one comes first. The two operators you can use for comparing scalars in Perl are cmp for comparing strings and <=> (called the spaceship operator) for comparing numbers. You can apply whatever transformations you like on $a and $b before you do the comparison if you need to. EXERCISE 5: Array functions 1. Create a new script with the following code: #!/usr/bin/perl use strict; my @array = ( 1.. 10 ); # create an array of numbers 1-10 print "The array contains: @array\n";

26 CHAPTER 1. INTRODUCTION TO PERL my $first_element = shift(@array); # remove the first element and store in first_element my $last_element = pop(@array); # remove the last element and store in last_element print "The first and last elements of the array are $first_element and $last_element\n"; push(@array, ( -5.. +5 ) ); # add the numbers -5 to +5 to the array print "The array currently contains: @array\n"; my @sortedarray = sort$a <=> $b(@array); # sort the array numerically print "The sorted array contains: @sortedarray\n"; my @new_array = qw(cat dog rabbit turtle fox badger); # create a new array using qw print "@new_array\n"; 2. Create a new script and the following array: @array = qw( 99players b_squad a-team 1_Boy A-team B_squad 2_Boy); 3. Sort the array using the following sorting options: (a) Sort numerically in ascending order: @array = sort $a <=> $b @array; (b) Sort numerically in descending order (same as before but with $a and $b swapped): @array = sort $b <=> $a @array; (c) Sort alphabetically in a case-insensitive manner: @array = sort lc $a cmp lc $b @array; 4. Create a new script with the following array:

1.3. ARRAYS 27 @words = qw( The quick brown fox jumps over the lazy dog and runs away ); 5. Using appropriate array access and join functions construct the following strings and store these in a single variable and print to screen: The quick fox jumps over the dog The brown fox runs away The lazy dog runs The dog runs away quick The quick brown dog runs over the lazy fox 1.3.4 Two dimensional arrays Often times it is useful to have a multi-dimensional array to simulate a table or other data structure such as a matrix. In Perl this is achieved by creating an array of arrays. my @array = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); To access the data you must add another index to your scalar listing. The first index after the scalar is the index for which sub-array you want to access. The second index is the index inside the sub-array. Like single dimensional arrays the indexes are numbered from 0 through n - 1. In order to print the centre element in the 2d array we just created we would use the command: print $array[1][1]. "\n"; # prints 5 to the screen print $array[1][2]. "\n"; # prints 6 to the screen print $array[1][3]. "\n"; # this is outside of our prepared array and will give an error use $array[2][0] instead To print the array you can use the following code: for $_ ( @array ) print "[ @$_ ],\n";

28 CHAPTER 1. INTRODUCTION TO PERL EXERCISE 6: Two dimensional array exercises 1. Create an array of people: my @people = (["Clark", "Kent"], ["Lois", "Lane"], ["Bruce", "Wayne"]); 2. Use push to add Superman to Clark Kent s sub-array. 3. Use pop to remove Bruce Wayne from the matrix. 4. Use a directly indexed scalar add Reporter to the third element of Lois Lane s sub-array. 5. Add a third sub-array with the values Jimmy, Olsen, Photographer. 6. Print the resulting matrix to the screen. 7. Print only the last names to the screen.

Chapter 2 Program Control In the simple scripts considered so far, the program execution starts at the top and every line is executed until we reach the bottom and the program terminates. In most programs things are not so straight forward and it is very useful to have pieces of code that are only executed when certain conditions are met. In this chapter we will examine conditional statements that allow such program control. 2.1 If-else statements The basic conditional statement is the if. An if statement evaluates a conditional statement and then executes a piece of code if that condition is true. A simple conditional statement using if is shown below: my $salary = 50000; if ( $salary > 100000 ) # if the value of salary is greater than 100,000 print "You must be a banker...\n"; Table 2.1 gives a list of some possible conditional statements that can be used to compare two numbers or two strings. 29

30 CHAPTER 2. PROGRAM CONTROL Conditional test Data Type Description $x == $y Numerical X is equal to Y $x!= $y Numerical X is not equal to Y $x > $y Numerical X is greater than Y $x < $y Numerical X is less than Y $x >= $y Numerical X is greater than or equal to Y $x <= $y Numerical X is less than or equal to Y $x eq $y Strings X is equal to Y $x ne $y Strings X is not equal to Y $x gt $y Strings X is greater than Y $x lt $y Strings X is less than Y Table 2.1: String comparisons. A list of conditional statements that can be used for comparing two numbers or two strings specified in the variables x and y. The final two comparisons compare the strings based on alphabetical order. This piece of code assigns the value 50,000 to the scalar variable salary. A conditional statement if is used to test whether the value contained in salary is greater than 100,000. If the salary is greater than 100,000, the statement You must be a banker... contained between the curly braces... is printed. An if statement can also contain an else statement which specifies a piece of code that is executed if the condition is false: my $salary = 50000; if ( $salary > 100000 ) # if the value of salary is greater than 100,000 print "You must be a banker...\n"; else # if the value of salary is less than 100,000 print "You are not a banker...\n"; EXERCISE 7: If-else statements

2.1. IF-ELSE STATEMENTS 31 1. Create a new text document and name it ifthenelse.pl. 2. Enter the source code below: #!/usr/bin/perl use strict; my $x = 5.1; my $y = 5; if ( $x > $y ) print "x is greater than y\n"; else print "y is greater than x\n"; $x = 5.0; $y = 5.0; if ( $x > $y ) print "x is greater than y\n"; elsif ( $y > $x ) print "y is greater than x\n"; elsif ( $y == $x ) print "x is equal to y\n"; 3. Switch to the terminal and run the program. > perl ifthenelse.pl x is greater than y x is equal to y

32 CHAPTER 2. PROGRAM CONTROL 4. Modify the program to accept the numbers entered by the user using <STDIN> and re-run the program to see the changes in the program behaviour. 5. Write a new program that computes the area of a circle with a radius that is specified by the user using <STDIN>. The area of a circle is π times the radius of the circle squared (π 3.141592654). 6. Modify the program so that if radius is a negative number the program will print The radius of a circle must be a positive number". 7. Add a conditional statement to print: (a) This is a big circle if the area of the circle is greater than 100. (b) This is a small circle if the area of the circle is less than 100. 2.2 Nested and Compound conditionals If we want to check more than one condition in an if statement, we can nest them to produce more complex logic: if ($salary > 100000) # if $salary is greater than 100,000 if ($bonus > 100000) # if $bonus is greater than 100,000 # this statement is only printed if salary > 100,000 AND bonus > 100,000 print "You are a lucky boy!"; In this example code, the value of salary is first checked to see if it is greater than 100,000 and if it is the value of bonus is then checked to see if it is also greater than 100,000. If both variables satisfy these conditions then the print statement is executed. However the above code example can be equivalently expressed using a compound statement where both conditions are evaluated at the same time:

2.2. NESTED AND COMPOUND CONDITIONALS 33 if ( ($salary > 100000) and ( $bonus > 100000 ) ) print "You are a lucky boy!"; The choice of whether to use nesting or compound conditionals typically depends on the problem being solved and the readability of the code. Long compound conditional statements are undesirable but multiple levels of nesting are also difficult to manage! EXERCISE 8: Nested and Compound If statements 1. Create a new text document and name it nestedif.pl. 2. Enter the source code below: #!/usr/bin/perl use strict; my $x = 5.1; my $y = 5.1; if ( $x > 5.0 ) if ( $y > 5.0 ) print "x and y are greater than 5\n"; if ( ( $x > 5.0 ) and ( $y > 5.0 ) ) print "x and y are greater than 5\n"; 3. Switch to the terminal and run the program: > perl nestedif.pl

34 CHAPTER 2. PROGRAM CONTROL x and y are greater than 5 x and y are greater than 5 4. Modify the program to accept numbers from <STDIN> and re-run the program to see the changes in the program behaviour. 5. Using a combination of and, or and if statements or nested statements, write a script to print out the following statements under these salary/bonus scenarios: (a) Salary < 100000, Bonus < 100000: You are not a banker. (b) Salary > 100000, Bonus < 100000: bonus. You are banker with no (c) Salary > 100000, Bonus > 100000: You are banker with a big bonus. (d) Salary < 100000, Bonus > 100000: You won the lottery. (e) Salary or Bonus > 100000: You are buying dinner tonight. 6. In Perl, we can use the =~ operator to perform pattern matching. The statement $x =~ /word/ is true if the variable x contains the phrase word. Using this pattern matching operator and conditional statements, write a case-insensitive test to see if an input string x contains the following text: Word to find Chris Bells Wonder Land Print this if found Found Chris! Ding dong! I was wondering about that too Air and Sea Test your code using the following strings: (a) Christmas Time (b) The bells are ringing in Wonderland (c) Stevie Wonder (d) The land of hope and glory

2.3. LOOPS 35 (e) Wondering about your day 2.3 Loops Until now the Perl code we have seen has been executed sequentially from start to finish with some parts missed out on the way due to conditional statements. Loops allow us to define repetitive code elements that can be executed repeatedly until some condition is satisfied. 2.3.1 While loops The simplest kind of loop is the while loop. The while loop consists of a block of code in curly brackets preceded by a statement which is evaluated as being either true or false. If it is true the block of code is run once and the condition is then tested again. The loop continues until the condition returns a false value. To make a while loop work you must have something change in the block of code which affects the condition you supplied at the start. If you do not have this then the loop will either not run at all, or it will continue running forever (known as an infinite loop). A simple while loop is shown below which illustrates the normal syntax for a loop: my $count = 0; while ($count < 5) print "Count is $count\n"; $count++; # this is Perl short hand for $count = $count + 1 In this loop the condition being evaluated is $count < 5, and the loop finishes because $count is increased by 1 every time the loop code runs.

36 CHAPTER 2. PROGRAM CONTROL 2.3.2 Foreach loops The other commonly used loop structure in Perl is the foreach loop. This is used to iterate through a list of values where you can supply a block of code which will be run once for each value in the list supplied. A simple foreach loop is shown below: foreach my $value (2, 4, 6, 8, 10) print "Value is $value\n"; This code prints "Value is..." for each of the values 2, 4, 6, 8 and 10 giving the following output: Value is 2 Value is 4 Value is 6 Value is 8 Value is 10 Although you can manually create a list for a foreach loop it is much more common to use an existing data structure instead. This is usually either an array, or a list of the keys of a hash. my @animals = ("cat", "dog", "rabbit"); # for each element in the array @animals assign it to the scalar $animal foreach my $animal (@animals) # print the value of $animal print "I have a $animal\n"; Finally, another useful bit of Perl syntax to use with foreach loops is the range operator. This consists of two scalars separated by a double dot (..) and creates a list in which all values between the two scalars are filled in.

2.3. LOOPS 37 foreach my $number (1.. 10) print "There are $number elephants\n"; foreach my $letter ("a".."z","a".."z") print "This is a letter: $letter\n"; The behaviour of numbers in ranges is pretty intuitive (goes up by one each time). Letters are OK as long as you keep everything either upper or lower case and do not mix the two. You can do ranges with multiple letters, but watch out because they get pretty big pretty quickly! 2.3.3 For loops A more classically styled for loop is also available for use in Perl which is more akin to the type of for loop structures seen in other programming languages such as C or Java. my @animals = ("cat", "dog", "rabbit"); for ( my $i = 0; $i < scalar(@animals); $i++ ) print "I have a $animals[$i]\n"; Here, $i is an index which increments each time the loop is repeated (indicated by the $i++) until the condition $i < scalar(@animals) is satisfied. The function scalar returns the number of elements in an array. In this code example, the names of all the entries in the array @animals are therefore printed just as we did previously using the foreach. As you can might imagine, Perl programmers tend to use foreach more often than the more basic for loops but it is helpful to understand how a for loop works as it does appear everything so often and is fundamentally what the foreach command is built upon. EXERCISE 9:

38 CHAPTER 2. PROGRAM CONTROL 1. Create a new text document and name it loops.pl. 2. Write a script that prints out the numbers 1980 to 2010 using a loop. 3. Modify your script to use a conditional statement and print out This is a new decade!" for years ending in nought. HINT: Use $year % 10 == 0 to test if a year is divisible by 10. 4. Use a while loop to count backwards from 10, print the numbers and print the line We have lift off! when the count reaches zero. 5. Create an array with the following strings as elements: James Bond 007 Department of Statistics University of Oxford Fantastic 4 Use a loop to print The string x contains numbers if $x does contain numbers. Print the uppercase version of strings that do not contain numbers. HINT: The test $x =~ /[0-9]/ can be used to identify if x contains any (single digit) number from 0 to 9.

Chapter 3 File Handling Reading large data files and creating and writing large quantities of data to files is one of the tasks that Perl is most often used for in real applications. Perl and its associated modules provide powerful tools for handling and manipulating files in an easy to use way. In this chapter, we will explore some of these capabilities. 3.1 File Handles In order to do any work with a file you first need to create a filehandle. A filehandle is a structure which Perl uses to allow functions to interact with a file. You create a filehandle using the open command. This can take a number of arguments: 1. The name of the filehandle (all in uppercase by convention) 2. The mode of the filehandle (read, write or append) 3. The path to the file When selecting a mode for your filehandle, Perl uses the following symbols: 1. < for a read-only filehandle 39

40 CHAPTER 3. FILE HANDLING 2. > for a writable filehandle (creates a new file for writing or clear an existing file of the same name) 3. >> for an appendable filehandle (adds to an existing file of the same name or create a new file if none exists) If you do not specify a mode, Perl assumes a read-only filehandle, and for reading files it is usual to not specify a mode. The difference between a write and append filehandle is that if the file you specify already exists a write filehandle will wipe its contents and start again whereas an append filehandle will open the file, move to the end and then start adding content after what was already there. The code below shows an example of opening a read-only and a writeable filehandle. open(in, 'readme.txt'); # open a file handle called IN for reading open(out, '>', 'writeme.txt'); # open a file handle called OUT for writing By convention the filehandle names (in this example IN and OUT) should be written in all capitals. In some older coding styles you may well see the mode combined with the filename although this style should now be considered obsolete and avoided: open(out, '> writeme.txt'); 3.2 Closing a filehandle When you have finished with a filehandle it is a good practice to close it using the close function. close(out); If you do not explicitly close your filehandle it will automatically be closed when your program exits. If you perform another open operation on a filehandle which is already open then the first one will automatically be closed when the second one is opened.

3.3. ERROR CHECKING 41 3.3 Error Checking Error checking is very important when opening files for read and write operations. What happens if a file does not exist or if you do not have permission to write to a file? Perl can force its way through these failures and move on giving you catastrophic or strange errors later in your code when you try to use the filehandle you created. It is therefore important to ensure that your Perl code ALWAYS checks that file operations are completed before proceeding. You can check that the operation succeeded by looking at the return value of the function open. If an open operation succeeds it returns true, if it fails it returns false. You can use a normal conditional statement to check whether it worked or not. my $return = open(in, "readme.txt"); if ($return) print "It worked!"; else print "Could not read the file!"; exit; More commonly, Perl programmers use the following code: open(in, "readme.txt") or die "Cannot read readme.txt: $!"; which gives the following output upon failure: Cannot read readme.txt: No such file or directory at line 5. If a file open fails, Perl stores the reason for the failure in the special variable $! and prints out the reason for failure using the function die which also terminates the program immediately.

42 CHAPTER 3. FILE HANDLING 3.4 Reading files Once a filehandle has been created to a file, data can be read from a file using the <> operator. The identifier of the filehandle you want to read from is placed between the angle brackets. This reads one line of data from the file and returns it. To be more precise this operator will read data from the file until it hits a certain delimiter. The default delimiter is your systems newline character (\n), hence you get one line of data at a time. my $file = "/tmp/myfile.txt"; # the path and file name of the file to be read open(in, $file) or die "Can't read $file: $!"; # open a file handle to the file my $first_line = <IN>; # reads next line from the file specified by the file handle IN print $first_line; # prints the first line of the file (note that the new line character in the file is retained print "The end"; close(in); This produces the following output: This is the first line, The end There is no \n specified at the end of the first print statement. This is because the <> operator does not remove the delimiter it is looking for when it reads the input filehandle so the variable $first_line already contains a new line delimiter. Normally you want to get rid of this delimiter, and Perl has a special function called chomp for doing just this. Chomp removes the same delimiter that the <> uses but only if it is at the end of a string. my $file = "/tmp/myfile.txt"; open(in, $file) or die "Can't read $file: $!"; my $first_line = <IN>; chomp($first_line); # remove the delimiter print $first_line; print "The end"; close(in);

3.4. READING FILES 43 This code produces: This is the first line,the end In order to read the entire file we must use a loop to apply the <> operator repeatedly to read all lines from a file. The typical way to read a file is to put the <> operator into a while loop so that the reading continues until the end of the file is reached. A while loop is preferred over for loops because we do not need to pre-specify how many times we want the loop to run for, only that we should stop when the end of the file is reached. my $file = "/tmp/myfile.txt"; open(in, $file) or die "Can't read $file: $!"; my $line_count = 1; while (my $line = <IN>) chomp($line); print "$line_count: $line\n"; $line_count++; close(in); Gives: 1: This is the first line, 2: here comes the second line, 3: and here the third one... 4: This is a boring file. 5: Let s move on to something more fun. EXERCISE 10: 1. Download the file fruit.csv from the course website: http://www.stats.ox.ac.uk/~aid/perl/

44 CHAPTER 3. FILE HANDLING 2. Create a new text document called readfile.pl and enter the following code: #!/usr/bin/perl use strict; my $infile = "fruit.csv"; open(fh, $infile) or die "Cannot open $infile\n"; # this bit of code reads (skips) the header line <FH>; while ( my $line = <FH> ) chomp($line); my @linedat = split(/,/, $line); # splits the line at commas my $fruit = $linedat[0]; my $quantity = $linedat[1]; my $unitprice = $linedat[2]; $unitprice = sprintf('%0.2f', $unitprice); # converts the unit price into 2 decimal places print "We have $quantity of $fruit at $unitprice pounds each\n"; close(fh);

3.5. WRITING TO FILES 45 3.5 Writing to files Writing to a file is straightforward once you understand the concept of a filehandle. After opening a filehandle for writing, the only function required is print. All of the previous print statements shown so far have actually been sending data to the STDOUT filehandle (i.e. the screen). If a specific filehandle is not specified then STDOUT is the default location for any print statements. open(out, '>', "write_test.txt") or die "Can't open file for writing : $!"; print OUT "Sending some data\n"; close(out) or die "Failed to close file: $!"; When writing to a file it is important to check that the open function has succeeded and that no error occurs when closing the filehandle. This is because errors can occur whilst you are writing data, for example if the device you are writing to becomes full whilst you are writing. EXERCISE 11: 1. Create a new text document called writefile.pl and enter the following code: #!/usr/bin/perl use strict; my $outfile = "myoutfile1.txt"; open(outfile, "> $outfile") or die "Cannot write to $outfile\n" ; print OUTFILE "This is my first file\n"; close(outfile); 2. Using a loop, add some extra code to print the numbers 1,.., 100 to the file.

46 CHAPTER 3. FILE HANDLING 3. Use conditional statements to print only odd numbers between 1 and 100 to the file. 3.6 Writing to multiple files A common file processing task, for which Perl is commonly used, is the extraction of selected portions of data from a large data file. Multiple output files may then be generated each with differing content. However, we may not know in advanced how many output files will be created (since this maybe determined by the input data) and hence how many output file handles we might require. In order to write to multiple files, we must create multiple filehandles, here we introduce Perl module called FileCache which contains a pre-defined library of file handling functions which will simplify the task of writing to and managing multiple file handles. In order to use the FileCache module, we must insert the following code at the top of our Perl script no strict 'refs'; use FileCache maxopen => 16; Here, the keyword use is to specify that we are using the FileCache module whilst maxopen => 16 are options specific to the FileCache module. In this case this is to specify the maximum number of file handles used by FileCache at any one time (note - that we can write to more than 16 different files but only a maximum of 16 filehandles will be opened at any one time and the module will take care of opening and closing filehandles appropriately). The no strict refs ; is required due to implementation issues with the FileCache module which we will not detail here. Now, instead of opening a file handle for writing as specified previously, we use the cacheout command to return a filehandle specified by the FileCache module. We can then use print to write to the file handle as before: my $file = '/tmp/outputfile.txt';

3.6. WRITING TO MULTIPLE FILES 47 my $FH = cacheout($file); print $FH "Writing to this file\n"; It is not obvious here what the point of using FileCache is so far other than as a possible simplification of the normal file handling creation process. However, consider the following example which contains an array of football clubs: # array of club wins my @wins = ( 'Manchester United', 'Arsenal', 'Chelsea', 'Manchester United', 'Chelsea', 'Chelsea' ); # for each club win foreach my $club ( @wins ) my $file = "$club.txt"; # generate a filename for this club my $FH = cacheout($file); # get file handle for this file print $FH "This team won\n"; # print In this example, the foreach loop goes through each club in the array, generates a filename ($file) for each club and uses cacheout to return a file handle to that file. It then writes the sentence This team won" in that file. The result is the creation of three files named Manchester United.txt (2), Arsenal.txt (1) and Chelsea.txt (3) respectively with This team won" printed the number of times indicated by the number in the brackets. Note that we did not have to scan through the array first to work out how many different clubs were present in the array nor did we have to explicitly create a file handle for each club - the FileCache module has done the hard work for us! This is very useful in more realistic settings where the arrays maybe considerably larger and there are a large number of different files involved.

48 CHAPTER 3. FILE HANDLING EXERCISE 12: 1. Create a new text document called myfilecache.pl and enter the following code: #!/usr/bin/perl use strict; no strict 'refs'; use FileCache maxopen => 16; my $infile = "departments.csv"; # this is the file to read # open a file handle open(infile, $infile) or die "Cannot open $infile\n"; # skip the header line <INFILE>; # read one line at a time while ( my $line = <INFILE> ) chomp($line); # extract data ( my $staffid, my $firstname, my $surname, my $department, my $employmentstatus ) = split(/,/, $line); my $name = $firstname. " ". $surname; print "$staffid\t$name\t$department\t$employmentstatus\ n"; # close file handle close(infile); 2. Using the cacheout function, extend this program to write a set of

3.7. TYING FILES TO AN ARRAY 49 files, one for each department, which contain the Staff ID and Names for each person working in that Department. The output files should be named after the department they represent. 3. Modify your program to only include full-time employees (FT). 3.7 Tying files to an array The module Tie::File allows the lines of a disk file to access as though they were a Perl array. 3.8 File System Operations As well as being able to read and write files Perl offers a number of other filesystem operations within the language. 3.8.1 Changing directory Instead of having to include a full path every time you open or close a file it is often useful to move to the directory you want to work in and then just use filenames. You use the chdir function to change directory in Perl. As with all file operation you must check the return value of this function to check that it succeeded. chdir ("/tmp/") or die "Couldn't move to temp directory: $!"; 3.8.2 Deleting files Perl provides the unlink function for deleting files. This accepts a list of files to delete and will return the number of files successfully deleted. Again you must check that this call succeeded.

50 CHAPTER 3. FILE HANDLING # This works: unlink ("/tmp/killme.txt", "/tmp/metoo.txt") == 2 or die "Couldn't delete file: $!"; # But this is better: foreach my $file ("/tmp/killme.txt", "/tmp/metoo.txt") unlink $file or die "Couldn't delete $file: $!"; 3.8.3 Listing files Another common scenario is that you want to process a set of files. Perl provides a function called glob to list files in a directory. my @files = glob("*.rtf"); print "I have ", scalar(@files), " rtf files in my directory\n"; You can also frequently encounter a shortcut for globbing which uses the angle brackets (<>) instead of glob. Both methods are entirely equivalent. my @files = <*.doc>; print "I have ", scalar(@files)," doc files in my directory\n"; Although you can return the output from a glob into an array as shown above, it is actually possible to treat it a bit like a filehandle and read filenames from it in a while loop. chdir ("/tmp/docs") or die "Can't move to docs directory: $!"; while (my $file = <*.doc>) print "Found file $file\n"; my @files = <*.doc> foreach my $file (@files) print "Found file $file\n"; Both the while and foreach loops give exactly the same output.

3.8. FILE SYSTEM OPERATIONS 51 3.8.4 Testing files It maybe necessary to identify properties of a file in an application. Perl provides a series of simple file test operators to allow you to find out basic information about a file. The file test operators are as follows: Test Description -e Tests if a file exists -r Tests if a file is readable -w tests if a file is writable -d tests if a file is a directory (directories are just a special kind of file) -f tests if a file is a file (as opposed to a directory) -T tests if a file is a plain text file All of these tests take a filename as an argument and return either true or false. chdir ("/tmp/docs/") or die "Can't move to docs directory: $!"; while (my $file = <*>) if (-f $file) print "$file is a file\n"; elsif (! -w $file) print "$file is write protected\n"; As an aside, in the above example, the test to identify whether the file is write protected uses an! at the start of the conditional statement. This operator reverses the sense of the test which follows. It is the Perl language equivalent of putting not at the end of a sentence. In this case!-w texts if the file is not writable, i.e. is write protected. EXERCISE 13:

52 CHAPTER 3. FILE HANDLING 1. Download the file animals.zip from the course website http://www.stats.ox.ac.uk/~aid/perl/ Extract its contents to your home directory. directory called animals with four files in it. This should create a 2. Use glob to find all the text files in the animals directory and store these in an array. 3. Create a single summary file containing a list of all the types of foxes, badgers and rabbits and their numbers. The summary file should have the following headers: with the data following underneath Species Type Number Fox Alopex 23 You will need to: (a) Check that each file is a text file. (b) Create a file handle for each data file and read in the data from each file. (c) Create a file handle to a summary file and write the animal data to this file. 4. Write some code to delete the file called sweets.dat in the animals directory. WARNING! BE CAREFUL WHAT YOU DELETE WHEN DOING THIS!

Chapter 4 Regular Expressions Perl has many features that set it apart from other languages. Of all those features, one of the most important is its strong support for regular expressions. These allow fast, flexible, and reliable string handling. A regular expression, often called a pattern in Perl, is a template that either matches or does not match a given string. Regular expressions are often used to implement the following types of tasks: 1. Complex string comparisons, e.g. find the text trans in the string variable $string = "transformer". 2. Complex string selections, e.g. select the text out in the string variable $string = "shout". 3. Complex string replacements, e.g. replace the text Presley with the text Costello in the string variable $string = "Elvis Presley". 4.1 Basic string comparisons The most basic string comparison is: $string =~ m/abc/; 53

54 CHAPTER 4. REGULAR EXPRESSIONS The above returns true if the string $string contains the sub-string abc and false otherwise. The operator =~ appears between the string variable you are comparing, and the regular expression you are looking for (note that in selection or substitution a regular expression operates on the string var rather than comparing). The operator m denotes a matching operation. Whilst the operator / is the usual delimiter for the text part of a regular expression. If the sought-after text contains slashes, it s sometimes easier to use pipe symbols ( ) for delimiters, but this is rare. Table 4.1 provides a list of variations on this basic operation. Example Description $string =~ m/abc/; Check if $string contains any instance of the text abc. $string =~ m/^abc/; Checks if $string contains an instance of the text abc at the beginning of the string. $string =~ m/abc$/; Checks if $string contains an instance of the text abc at the end of the string. $string =~ m/^abc$/; Checks if $string contains only the text abc. $string =~ m/abc/i; Peforms a case-insensitive match to see if $string contains an instance of the text abc. Table 4.1: Example string comparisons using regular expressions. 4.2 Using Wildcards and Repetitions Perl regular expressions allow us to use wildcards (in computing, a wildcard character can be used to substitute for a particular type of character or characters in a string) and repetitions to match multiple instances of a character. Table 4.2 provides a list of wildcard characters. You can also follow any character, wildcard, or series of characters and/or wildcard with a repetiton in order to match multiple instances of particular types of characters. Table 4.3 lists some examples. For example, the following regular expression checks if $string contains a percentage symbol followed by a whitespace character and then any other characters: $string =~ m/^%\s.*/i;

4.2. USING WILDCARDS AND REPETITIONS 55 Character Description. Match any character \w Match word" character (alphanumeric plus _) \W Match non-word character \s Match whitespace character \S Match non-whitespace character \d Match digit character \D Match non-digit character \t Match tab \n Match newline \r Match return \f Match formfeed \a Match alarm (bell, beep, etc) \e Match escape Character Table 4.2: Wildcards in regular expressions Description * Match 0 or more times + Match 1 or more times? Match 1 or 0 times n Match exactly n times n, Match at least n times n,m Match at least n but not more than m times Table 4.3: Repetitions in regular expressions This regular expression checks for a percentage symbol at the start of the string using ^% that is then followed by a whitespace character \s and then any number of characters using.*. Strings that would return true include % Hello World and % Apple but not %Hello or %% Banana. We can check to see if a string satisfies a particular format, for example, a classic DOS-style 8.3 filename format, using regular expressions. $string =~ m/^\s1,8\.\s3$/; This regular expression matches 8 non-whitespace characters at the start of the string using ^\S1,8 followed by a dot \. (note the use of backslash) and then three further non-whitespace characters at the end of the string \S3$.

56 CHAPTER 4. REGULAR EXPRESSIONS 4.3 Groups Groups are regular expression characters surrounded by parentheses. Powerful regular expressions can be made with groups. At its simplest, you can match either all lowercase or name case like this: $string =~ m/(g g)eorge (C c)looney/; Detect all strings containing vowels: $string =~ m/(a E I O U Y a e i o u y)/; Detect if the line starts with any of the last three Prime Ministers: $string =~ m/^(blair Brown Cameron)/i; Groups can also be used for string selections: $string = "01234 56789"; $string =~ m/(\d+)\s(\d+)/; print "$1, $2\n"; would produce the following output: 01234, 56789 The regular expression (\d+)\s(\d+) matches one or more digits, followed by a whitespace character and then one or more digits. The special variables $1 and $2 store the matches corresponding to the groups of digits in the regular expression. 4.4 Character Classes Character classes are alternative single characters within square brackets that can be used as an alternative to groups. Character classes have three main advantages: Shorthand notation, as [AEIOUY].

4.5. PUTTING IT ALL TOGETHER 57 Character Ranges, such as [A-Z]. One to one mapping from one class to another, as in tr/[a-z]/[a-z]/ (we will describe translations using tr later). A hyphen is used to indicate all characters in the sequence between the character on the left of the hyphen and the character on its right. An uparrow (^) immediately following the opening square bracket means Anything but these characters, and effectively negates the character class. For instance, to match anything that is not a vowel, do this: if ( $string =~ m/[^aeiouyaeiouy]/ ) print "This string contains a non-vowel"; Contrast this to the following which returns true if the string contains no vowels at all (note the use of!~ to denote not matching): if ( $string!~ m/[aeiouyaeiouy]/ ) print "This string contains no vowels at all"; Print all people whose name begins with A through E if ( $string =~ m/^[a-e]/ ) print "$string\n"; 4.5 Putting it All Together We can put all these features together to produce some powerful string matching operations. For example, the following regular expression prints everyones whose last name is Blair, Brown or Cameron. Each element of the list is first name (^S+), blank (\s+), last name (Blair Brown Cameron), and possibly more characters after the last name:

58 CHAPTER 4. REGULAR EXPRESSIONS if ( $string =~ m/^\s+\s+(blair Brown Cameron)/i ) print "$string\n"; A more complex example is to print a string if it contains a valid phone number: $string = "(01235) 264532"; if ( $string =~ m/(\(\d5\) (\+\d1,2\s\(\d1\)\s\d4))\s\d 6/ ) print "Phone line: $string\n"; $string = "+44 (0) 1235 264532"; if ( $string =~ m/(\(\d5\) (\+\d1,2\s\(\d1\)\s\d4))\s\d 6/ ) print "Phone line: $string\n"; Here we use a regular expression that allows for both national and international number formats (note that \( and \) are used to match the parentheses in the strings and are not the parentheses associated with the use of groups). 4.6 Other string operations In addition to string comparison, we can also do string substitutions and translation: 4.6.1 Substitutions Replace every Gordon Brown" with David Cameron": $string =~ s/gordon Brown/David Cameron/;

4.6. OTHER STRING OPERATIONS 59 Now do it ignoring the case of gordon brown: $string =~ s/gordon Brown/David Cameron/i; Using g, instead of replacing the first instance of the pattern encounter we can replace globally all instances of the pattern in the string: $string =~ s/gordon Brown/David Cameron/g; 4.6.2 Translations Translations are like substitutions, except they happen on a letter by letter basis instead of substituting a single phrase for another single phrase. For instance, what if you wanted to make all vowels upper case: $string =~ tr/[a,e,i,o,u,y]/[a,e,i,o,u,y]/; Change everything to upper case: $string =~ tr/[a-z]/[a-z]/; Change everything to lower case: $string =~ tr/[a-z]/[a-z]/; Change all vowels to numbers to avoid "4 letter words" in a serial number: $string =~ tr/[a,e,i,o,u,y]/[1,2,3,4,5]/; EXERCISE 14: 1. Using regular expressions, test whether a string has a valid IP address (IPv4) format. Note: IPv4 addresses are canonically represented in dot-decimal notation, which consists of four decimal numbers, each ranging from 0 to 255, separated by dots, e.g., 172.16.254.1. Then using sub-string operations, verify that the address is valid.

60 CHAPTER 4. REGULAR EXPRESSIONS 2. Write a Perl program to read in an input file containing Name Surname" lines and produce a second file with the format Surname, Name" (note the comma after the surname). Use regular expressions to do the string conversion. 3. Write a Perl program to eliminate the blank lines from a text file, e.g. If the source file has the lines: Line 1 Line 2 Line 4 Line 6 Your program should modify this file to become: Line 1 Line 2 Line 4 Line 5 Line 6

Chapter 5 Hash Tables The final variable type in Perl is the hash. A hash is a form of lookup table, it consists of a collection of key-value pairs, where both the key and value are scalars. You can retrieve a value from the hash by providing the key used to enter it. Although you can have duplicate values in a hash the keys must be unique. If you try to insert the same key into a hash twice the second value will overwrite the first. Hashes do not preserve the order in which data was added to them. They store your data in an efficient manner which does not guarantee ordering. If you need things to be ordered use an array. If you need efficient retrieval use a hash. Figure 5.1 illustrates the differences between an array and hash schematically. Figure 5.1: Schematic diagrams of (a) an array and (b) a hash. In an array the values are stored sequentially and there is an ordering to the data. In a hash, the keys are transformed into locations (via a something called a hash function ) and the values are stored in an unordered state. 61

62 CHAPTER 5. HASH TABLES Hash names all start with the % symbol. Hash keys are simple scalars. Hash values can be accessed by putting the hash key in curly brackets after the hash name (which would now start with a $ as we re talking about a single scalar value rather than the whole hash. For example to retrieve the value for the key alpha6574" from %names we would use $names"alpha6574". 5.1 Creating a Hash When you create a hash you can populate it with data from a list. This list must contain an even number of elements which come as consecutive sets of key-value pairs: my %eye_colour = ( "Simon Brown", "Brown", "Iain Smith", "Blue", "Conor Murphy", "Grey" ); print $eye_colour"simon Brown"; # prints Brown Alternatively, it is also possible to use the => operator in place of a comma (it s also known as a fat comma). This has the same effect as a comma, and in addition it also automatically quotes the value to its left so you don t need to put quotes around the key names. The code below does exactly the same thing as the one above. my %eye_colour = ( "Simon Brown" => "Brown", "Iain Smith" => "Blue", "Conor Murphy" => "Grey" ); This version makes it much clearer which are the keys and which are the values. 5.2 Testing for keys in a hash One very common operation is to query a hash to see if a particular key is already present. This is a seemingly a straightforward operation but it can lead to errors if we are not careful. One of the features of a hash is that although you need to declare the hash itself the first time you use it you do

5.2. TESTING FOR KEYS IN A HASH 63 not need to declare each element each time you add one. This makes hashes very flexible, but also means you can put bugs like this in your code: my %eye_colour = ( "Simon Brown" => "Brown", "Iain Smith" => "Blue", "Conor Murphy" => "Grey" ); print $eye_colour"alan Dunson"; # Compiles OK, but get warning at runtime To allow hashes to be flexible, if a key is used which does not exist then that key is automatically created in the hash so you can proceed. my %eye_colour = (); $eye_colour"richard Yates" = "Grey"; # No error print $eye_colour"richard Yates"; # prints grey This functionality has implications when it comes to testing for the presence of a key in the hash. The most obvious (and wrong) way to do this would be something like: my %eye_colour = (); $eye_colour"richard Yates" = "Grey"; if ($eye_colour"simon Jones") print "We know about Simon"; # Doesn't print if (defined $eye_colour"robert Davies") print "We know about Robert"; # Doesn't print If you run this code it all seems to work the way it should, and in this isolated case it does. The problem is that because you have used the hash keys $eye_colour"simon Jones" and $eye_colour"robert Davies" in your tests, these have both been created in the hash and an undef value used as their values. This can cause trouble later if you want to iterate through all the keys in your hash. You will find you have two more keys than you expect! The correct approach for testing for the presence of a key in a hash is to use the exists function. This is specifically designed for this purpose and will not alter the hash when used.

64 CHAPTER 5. HASH TABLES my %eye_colour = (); if (exists $eye_colour"simon Jones") print "We know about Simon"; # Doesn't print if (exists $eye_colour"robert Davies") print "We know about Robert"; # Doesn't print 5.3 Retrieving keys and values from a hash Once a hash has been created we can scan through it using a foreach loop as follows: my @hashkeys = keys(%hashtable); # get the keys used in the hash table @hashkeys = sort(@hashkeys); # sort the keys # for each key used in the hash table foreach my $key ( @hashkeys ) # get the value for this hash entry my $value = $hashtable$key; # print the key/value pair print "$key\t$value\n"; In this code example, all the keys used in the hash table are extracted using the function keys which returns an array that we call hashkeys. The sort function is then applied to the array of keys to sort them. A foreach loop is then used to go through the array of hash keys and we access the scalar value associated with each hash key using $hashtable$key. The code example can be equivalently expressed as follows: # for each key used in the hash table foreach my $key ( sort keys %hashtable )

5.4. FREQUENCY TABLES USING HASHES 65 # get the value for this hash entry my $value = $hashtable$key; # print the key/value pair print "$key\t$value\n"; This shortened form is what you would most likely see used in practice in the field. If we want to retrieve an array of all the values in a hash table, we can use the value function as follows: my @hashvalues = values(%hashtable); # get an array of all the vvalues used in the hash table 5.4 Frequency tables using hashes A hash can be a useful device for counting the number of objects that occur or creating frequency tables. Suppose we wanted to count the number of apples, pears and bananas in the array fruits: my @fruits = qw( Apple Apple Pear Apple Banana Pear Pear Banana Banana Banana Apple ); We could use a loop to scan through the array and, for each element of the array, use conditional statements to determine what fruit has been specified and increment a count of each fruit. my $number_of_apples = 0; # a counter to keep track of the number of apples my $number_of_bananas = 0; # a counter to keep track of the number of bananas my $number_of_pears = 0; # a counter to keep track of the number of pears # go through the array of fruits foreach my $fruit ( @fruits )

66 CHAPTER 5. HASH TABLES # use a switch statement to determine which counter to increment if ( $fruit eq "apple" ) $number_of_apples = $number_of_apples + 1; elsif ($fruit eq "banana") $number_of_bananas = $number_of_bananas + 1; elsif ($fruit eq "pear") $number_of_pears = $number_of_pears + 1; # print results print "Number of apples: $number_of_apples\n"; print "Number of bananas: $number_of_bananas\n"; print "Number of pears: $number_of_pears\n"; Although this method works for this example, it is difficult to generalise it for other applications. What if the array contained an unknown number of fruit types? We would not be able to pre-specify different scalar variables to store the counts of each fruit and we would not be able to use an if statement to increment the right count. Fortunately, hash tables allow us to perform this counting task more compactly and generally: my %num_of_fruit = (); # a hash table to store the count of the number of fruits # go through the array of fruits foreach my $fruit ( @fruits ) # check if we have seen any fruits like this and if not initialise a new key with value 0 if (!exists($num_of_fruit$fruit) ) $num_of_fruit$fruit = 0; # increment the count for this fruit

5.4. FREQUENCY TABLES USING HASHES 67 $num_of_fruit$fruit = $num_of_fruit$fruit + 1; # print results foreach my $fruit ( sort keys %num_of_fruit ) my $count = $num_of_fruit$fruit; print "Number of $fruit: $count\n"; This alternative code performs the same fruit counting procedure. Here a hash table that will contain the counts of each fruit is created called num_of_fruit. The keys we will use will be fruit names and the values will be the count. We scan through the arrays fruits and, using the fruit type as a key into the hash table, increment the value associated with the key by one each time we encounter a fruit of that type using $num_of_fruit$fruit = $num_of_fruit$fruit + 1. Note that we use the!exists function (the! means not so we are doing a not-exist operation) to ask if we have seen fruits of this type before (i.e. has the key been used in the hash before). If not, the key is created and the value is first initialised to zero to ensure that we always start counting from zero. This initialisation is not necessary but is good practice to ensure that you always count from zero. EXERCISE 15: 1. Download the file counties.csv from the course website: http://www.stats.ox.ac.uk/~aid/perl/ 2. Read the contents of the counties file using Perl. The file has the following format: FirstName Surname Email County Date of Birth Sopoline Carpenter elit@semper.ca Northumberland 21/04/1995

68 CHAPTER 5. HASH TABLES 3. Find the number of people born since the year 2000. Hint: you will need to do some string manipulation to extract the year from the Date of Birth fields. 4. Find the number of people living in each county. Hint: Use a hash table. 5.5 Using records It maybe necessary sometimes to attach more than a single scalar value to a key. Perl allows us to create records for such a purpose. Records are defined using curly brackets and individual fields can be assigned separated by commas. my %customers = (); $customers"jg023290232" = "Name" => "Christopher O'Donnell", "Address" => "182 St Margarets Road", "Post Code" => "OX4 1TG", "Telephone Number" => "01223 453 788"; To access a particular sub-value in a record, we use the operator ->, print $customers"jg023290232"->"name". "\n"; # prints Christopher O'Donnell EXERCISE 16: 1. Download the file cars.csv from the course website: http://www.stats.ox.ac.uk/~aid/perl/ 2. Create a new text document called hashtable.pl and enter the following code:

5.5. USING RECORDS 69 #!/usr/bin/perl use strict; my $infile = "cars.csv"; my %customertable = (); open(infile, $infile) or die "Cannot open $infile\n"; <INFILE>; while ( my $line = <INFILE> ) chomp($line); ( my $id, my $name, my $car, my $value) = split(/\t/, $line); $customertable$id = NAME => $name, CAR => $car, VALUE => $value, ; close(infile); foreach my $customer_id ( sort $a <=> $b keys(% customertable) ) my $customer_name = $customertable$customer_id->name; my $customer_car = $customertable$customer_id->car; my $car_value = $customertable$customer_id->value; print "Customer ID $customer_id: $customer_name owns a $customer_car which costs $car_value\n"; 3. Modify the code to exclude cars with values below 15, 000. 4. Add the following customer cars to the hash table: (a) 310468389, Joshua Rankin, MG, 34,000

70 CHAPTER 5. HASH TABLES (b) 550433311, Josephine Gould, Vauxhall, 4,500 5. Create a separate file for each make of car and write the customer details to each file.

Chapter 6 Sub-routines Until now the scripts used have been kept simple by keeping a roughly linear execution path and using simple data structures. In more complex problems, you will often find yourself repetitively the same tasks leading to code repetition and a nasty number of variables floating around in your script. This section aims to address these problems and describes the concept of modularisation. 6.1 Sub-routines A subroutine is a block of code to which you can pass a list of scalars and from which you can collect a list of scalars. A sub-routine contains code which takes the input variables and creates the output. You can then call this code from anywhere in your program and thereby provides a sensible way of reusing code. Sub-routines start with the sub keyword, followed by their name (conventionally all in lowercase), followed by a block of code surrounded by curly brackets. The simplest kind of subroutine is one which takes no arguments and returns no values. sub print_it print "Look! A subroutine!\n"; 71

72 CHAPTER 6. SUB-ROUTINES A demonstration script which uses the sub-routine is shown below: #!/usr/bin/perl use strict; my $mynumber = 1; if($mynumber == 0) print_it(); elsif($mynumber == 2) print_it(); elsif($mynumber == 4) print_it(); else print "I am not a sub-routine\n"; sub print_it print "Look! A subroutine!\n"; The sub-routine is called by using the name followed by curly brackets print_it(). Here, there is nothing between the brackets, but in general the lists of input variables would be placed here. The sub-routine is called depending on the value of mynumber using an if conditional block. Note that instead of writing out print "Look! A subroutine!\n"; for each case, we call the sub-routine instead. This is useful because if we wanted to change the printed message we would have to change the code in the sub-routine once only. If we had written out the print statements explicitly, each of those would need to be modified. Conventionally, sub-routines go at the bottom of your script, after the main part of the program - but they do not have to.

6.1. SUB-ROUTINES 73 6.1.1 Local and global scoping Before seeing more complex sub-routines, we must think about global and locally scoped variables. So far, we have always used variables in global scope, this means that a variable defined in one part of a script can be accessed in another part. Consider the following script: #!/usr/bin/perl use strict; my $x = 150; global_print(); ## SUB-ROUTINES ## sub global_print print "The global value of x is: $x\n"; Here, we have declared and initialised a variable x to have the value 150. We then call a sub-routine global_print which prints out the value of x. This script gives the following output: The global value of x is: 150 Note that it should be apparent that global scoping could be potentially dangerous. What if we have a very long scripts or many sub-routines? What happens if we accidentally use the same variable name to mean two different things? Yes, unfortunately this can lead to all sorts of logical errors! Now, consider this second example, #!/usr/bin/perl use strict; my $x = 150;

74 CHAPTER 6. SUB-ROUTINES global_print(); print "The value of y is: $y\n"; ## SUB-ROUTINES ## sub global_print my $y = 10; print "The global value of x is: $x\n"; Why is an error given when we attempt to run this script? The answer is that the variable y is only declared within the scope of the sub-routine global_print. What this means is that the variable y is only locally visible to objects declared within the same set of curly braces as y, i.e. anything inside the sub-routine. Hence, in this instance, once the sub-routine has finished executing, the variable y is destroyed and cannot be printed. In Perl, the keyword my allows us to define variable to have local scoping. This means the variable is only visible to other objects within the same scope as the variable. Let us consider an example script illustrating the use of global and local variables: #!/usr/bin/perl use strict; my $x = 150; my $y = 250; local_print(); global_print(); ## SUB-ROUTINES ## sub global_print print "The global value of x is: $x\n"; print "The global value of y is: $y\n";

6.1. SUB-ROUTINES 75 sub local_print my $x = 20; my $y = 10; print "The local value of x is: $x\n"; print "The local value of y is: $y\n"; If we run this code, the following output is obtained: The local value of x is: 20 The local value of y is: 10 The global value of x is: 150 The global value of y is: 250 In the main script, two variables x and y are defined and assigned values of 150 and 250 respectively. Two sub-routines global_print and local_print are then called. In local_print the keyword my is used to create two variables also called x and y and these are set to 20 and 10. When local_print executes it uses the local variables not the global ones. In contrast, in global_print looks for the global variables of x and y as no local variables with these names exist. 6.1.2 Passing Scalar Arguments We now consider how to pass variables (referred to as arguments) into a sub-routine to perform more complex tasks. Arguments to a sub-routine are passed in the special array @_. #!/usr/bin/perl use strict; cap_it("kayak"); cap_it("racecar"); sub cap_it

76 CHAPTER 6. SUB-ROUTINES my $string = $_[0]; unless ( defined($string) ) die "The cap_it subroutine requires a string"; print "The word $string in capitals would be ", uc($string). "\n "; Here we have defined a sub-routine cap_it whose function is to convert an input string into all capital letters. The sub-routine is called using the name of the sub-routine followed by a single string encased in curly brackets, e.g. cap_it("kayak");. Within in the sub-routine, as the string is the only argument being passed, it will be contained in the first element of the special array @_ which can be accessed and assigned to a temporary variable using my $string = $_[0];. A sub-routine can return values using the return function. The following script uses a sub-routine number_stats that takes a list of numbers and calculate the mean average and sum total. (my $total, my $count, my $average) = number_stats(1,2,3,4,5,6); print "Total = $total, Count = $count, Average = $average\n"; sub number_stats my @numbers = @_; my $count = 0; my $total = 0; foreach $number (@numbers) $count = $count + 1; $total = $total + $number; my $average = $total/$count; return ($total, $count, $average); EXERCISE 17:

6.1. SUB-ROUTINES 77 1. Create a new text document called subfunc.pl and enter the following code: #!/usr/bin/perl use strict; foreach my $number (1.. 10) my $squarenumber = square($number); print "The square of $number is $squarenumber\n"; sub square my $ans = $_[0] * $_[0]; return $ans; 2. Add a new sub-routine which calculates the cube of the numbers and print out the results to screen. 3. Add a new sub-routine that calculates both the square and cube of the numbers and returns two variables. 4. Write a sub-routine that returns the maximum of two numbers and extend your script to test its correct functionality. 5. Write a sub-routine that calculates the area of a circle and its circumference when passed a radius and extend your script to test its correct functionality. 6. Download the file sales.zip and extract its contents: http://www.stats.ox.ac.uk/~aid/perl/ 7. Create a new text document called subfunc2.pl and enter the following code: #!/usr/bin/perl use strict;

78 CHAPTER 6. SUB-ROUTINES my @filenames = ( "files/north2009.txt", "files/south2009.txt", "files/east2009.txt", "files/west2009.txt" ); foreach my $file ( @filenames ) my $totalsales = readfile($file); sub readfile 8. Complete the sub-routine readfile to read the sales data for each of the regions in 2009. The sub-routine should return the total sales across all individuals working in that region. 9. Repeat for the 2010 files. 10. Have sales improved in 2010? 6.1.3 Passing Arrays and Hashes as Arguments Perl only allows us to pass scalars as arguments in sub-routines. What if we want to pass arrays or hashes? Perl has a mechanism called referencing" to allow this. Instead of passing the whole array or hash (which might be very large) we pass a reference to the array or hash. The following code example shows how to pass an array to a sub-routine: # create a reference to an array using \ my $numbers_ref = \@numbers; # calling the add function and passing the array numbers by reference my $total = sum($numbers_ref);

6.1. SUB-ROUTINES 79 # function to add up elements of an array sub sum # retrieve the reference to the array my $array_ref = $_[0]; my $ans = 0; # variable to store sum # dereference and add each element to sum foreach $element ( @$array_ref ) $ans = $ans + $element; return $ans; The sub-routine here is called sum and it takes an array of numbers and adds them up returning the total. The key points are that we first defined a scalar to store a reference to an array called numbers_ref. We then used the operator \ retrieve the reference to the array numbers and stored this in numbers_ref. When we call the sub-routine $total = sum($numbers_ref) we pass the scalar reference variable not the array. Within the sub-routine, we create a local reference variable array_ref to retrieve the reference that was passed to the sub-routine. We then accessed the array elements in the array referenced by array_ref by using the dereferencing operator @ in front of the $ in the local reference variable. Note, that in actual usage, many Perl programmers do not explicitly create a reference variable and do the pass-by-reference directly: my $total = sum(\@numbers); We can pass hashes in a similar fashion as shown in the following code example: # create a hash table my %names = ();

80 CHAPTER 6. SUB-ROUTINES $names"mark" = 18; $names"peter" = 15; $names"rob" = 17; $names"henry" = 11; # call a sub-routine to count the number of people in the hash my $num_of_people = countpeople(\%names); print "$num_of_people\n"; # sub-routine to count the number of keys in a hash sub countpeople # get reference to hash my $hash_ref = $_[0]; # count number of keys my $num_of_keys = scalar(keys(%$hash_ref)); # return answer return $num_of_keys; The difference is that the dereferencing operator for a hash is % whilst it is a @ for an array. We can also return arrays and hashes in sub-routines using references. Again, this is useful for avoiding unnecessary copying of large arrays and hashes which might use up a lot of computer memory. # call a sub-routine to find all the names in the hash table my $unique_names_ref = findnames(\%names); # print list of unique names print "@$unique_names_ref\n"; # sub-routine to count the number of keys in a hash sub findnames

6.1. SUB-ROUTINES 81 # get reference to hash my $hash_ref = $_[0]; # get all unique names my @unique_names = keys(%$hash_ref); # return answer return \@unique_names; EXERCISE 18: 1. Create a sub-routine that takes an array of numbers (passed by reference), finds all the unique numbers in the array and returns a sorted array (passed by reference) of those unique numbers, e.g. # an array of numbers my @numbers = qw( 1 1 2 2 3 3 5 5 7 7 7 9 9 9 9 15 ); # call to a sub-routine called uniquevalues my $unique_numbers_ref = uniquevalues(\@numbers); # print out array of unique values print "@$unique_numbers_ref\n"; should display the numbers 1 2 3 5 7 9 15. 2. Download the file cars.csv from the course website: http://www.stats.ox.ac.uk/~aid/perl/ 3. Create a new text document called customer.pl and enter the following code: #!/usr/bin/perl use strict;

82 CHAPTER 6. SUB-ROUTINES my $infile = "cars.csv"; my %customertable = (); open(infile, $infile) or die "Cannot open $infile\n"; <INFILE>; while ( my $line = <INFILE> ) chomp($line); ( my $id, my $name, my $car, my $value) = split(/\t/, $line); $customertable$id = NAME => $name, CAR => $car, VALUE => $value, ; close(infile); NOTE: You should re-use code from the previous chapter if you have already done this. 4. Write a sub-routine that takes a hash table as input (passed by reference) and returns an arrays (passed by reference) of all the unique types of car in the hash table, e.g. my $unique_cars_ref = uniquecars( \%customertable ); 5. Write a sub-routine that scans the hash table (passed by reference) and returns an array (passed by reference) of customer names whose cars greater than a value specified by a second input variable, e.g. my $customers_ref = findcars( \%customertable, $value ); 6. Write a sub-routine that accepts a string and hash table (passed by reference) as inputs. The sub-routine should search the hash table to find the individual with the name specified in the input string and return the car type and value as two scalars, e.g. ( my $car, my $value ) = findcustomer( $name, \% customertable );

6.1. SUB-ROUTINES 83

84 CHAPTER 6. SUB-ROUTINES

Appendix A Getting Started with the Linux Command Line 85

Getting to grips with the Linux Command Line Exercise 1: Getting a shell prompt/command line The first thing you need is a shell prompt/command line. When using a graphical desktop there is usually a terminal application which gives you access to the command line. Find Applications > Accessories > Terminal You should see a Window open that looks like this: Note that it's perfectly possible to have a command line with no graphical desktop at all. This is often the case with server systems which are not used interactively and need all the processor power and memory they can get for computation. We will be finding out more about sudo and the root (or administrator) user in the last session. If you want you can add drag the terminal icon from the Applications > Accessories menu onto the panel at the top of the screen. Then to start the terminal you just need to click on the icon on the panel. Exercise 2: Where am I? What's all this? Let's start to look at navigation of the Linux file system. The following commands are introduced: Note that all commands are typed in lower case. There are very few Linux commands which have any uppercase (CAPITAL) letters. We will look at case sensitivity and file names in a later exercise. Command pwd Purpose print working directory. In other words, where am I?

Command ls [options] directory file filename cd man command Purpose List files. If used on its own, it lists everything in the current working directory (where you are currently located ). Tells you what sort of file the file called filename (for example) is. change directory. In other words, please change my current location. Command manual pages. Right away we can see how quiet Linux commands are by default. Try typing in cd at the ubuntu@ubuntu:~$ prompt and you will get no output at all. This does not mean that anything has gone wrong. For many commands, no output means successful completion. A digression on prompts. You can customise your prompt to look however you like. We won't do that now, but you will notice that it changes as you move around the file system. Not all commands are silent. Try pwd You should get a response like: /home/ubuntu. Now try ls You should now see a listing of all the files in the directory /home/ubuntu. Let's try finding out what sort of files each is. Take the file Desktop. file Desktop The shell tells you that this isn't a regular file, it's a directory. In other words it's a special file which acts as a holder for yet more files (like a folder in Windows). Try this with one or more of the other files in /home/ubuntu/desktop/examples. Can you see any correlation between the colour of the files when you ran ls and what sort of file they are? Note that not all UNIXes and Linuxes display files in colour (which is why the file command was invented). TIMESAVING INFORMATION If you haven't already tried this then you should now. A lot of typing can be avoided by several useful shortcuts. The <tab> key can be used to complete commands file names and the arrow keys to recall previous commands and perhaps change them. kateexercise 3: Absolute and relative pathnames We're now going to make use of two things, the ls command and the knowledge that the file called /usr/share/example.content is a directory, to illustrate the concepts of absolute and relative pathnames. cd cd../../usr/share ls example.content and you will get a listing of the contents of the Desktop/Examples directory. These files reside in the Desktop directory. Now try:

ls /usr/share/example content and you should get the same list of files. The absolute (i.e. complete) location of the Examples directory is /usr/share/example content. We have just asked to see what is kept inside it in two different ways. The first is a relative pathname while the second is the full or absolute name. Imagine the example content directory is a particular house, say 42, High Street, Abingdon and I ask you to deliver a letter there. I could tell you to deliver the letter to 42 High Street, Abingdon : the full/absolute address. No matter where you are in the UK, that's enough information. However, if you were already in Abingdon I could tell you to deliver the letter to the relative address of 42, High Street or even better, if you were standing on the high street just number 42 would be enough. The ls example content command worked because you were already the /usr/share directory. It wouldn't work from somewhere else. The ls /usr/share/example content command will work from anywhere (although it's more long winded). Let's prove it by changing our current location using the cd command. cd cd Desktop pwd you should get /home/ubuntu/desktop i.e. you have moved into the Desktop directory. ls /home/ubuntu/desktop should give you the list of files in that directory. In fact you could use ls on it's own without the name of the directory because you have already moved there with cd. Let's see what happens when we deliberately do something wrong: ls Desktop should give you an error saying there is No such file or directory which is correct. The command fails because Desktop on its own is a relative name and you've started from the wrong place. Let's expand the idea of relative and absolute path names using the cd command. Make sure you are still in the Desktop directory before you start (check with pwd). pwd cd.. pwd cd.. pwd and so on until you can't go any further (you won't see an error, you just stop going anywhere)... is a special directory which means up one level. All directories contain a.. so you can go up a level. The exception is called / or sometimes the root or just slash. You can't go any higher than / so.. doesn't take you anywhere. Note that there is another special directory called. which means current location. All directories contain a. directory and we'll see why it is needed later. During the above task you went up the directories one level at a time. Now let's reverse the process and go back to the Desktop directory one level at a time. You should be in /. Note that you don't have to do the pwds but it may help you visualize what is going on. You can also use ls to have a look around each level if you have time. cd home pwd cd ubuntu

pwd cd Desktop pwd Try to answer/do the following: Were you just using absolute or relative paths? 1. Now try to get back to the root (or / ) directory with one command only using an absolute path. 2. Now get back to the /home/ubuntu/desktop directory using one command only. 3. What are the contents of the / directory (use one command only to find out)

90 APPENDIX A. GETTING STARTED WITH LINUX

Appendix B Further Exercises These further exercises are open-ended and their solutions can use any of the material discussed in this book. The files required for these exercise can be obtained via the course website: http://www.stats.ox.ac.uk/~aid/perl/ EXERCISE 19: Premiership Goal Scorers The file scorers.csv contains a list of all the top scorers in the English Football Premier League since 1992: 1. Write a Perl script that verifies the existence of the file and then reads in the data. 2. Find all the players who have been top scorer in the Premier League. 3. Find the number of clubs who have had a top scorer in the Premier League. 4. Find the player who has been top scorer for the most number of seasons. 91

92 APPENDIX B. FURTHER EXERCISES EXERCISE 20: Oscars The file oscars.csv contains a list of all the films who have received the most Oscars: 1. Write a Perl script that verifies the existence of the file and then reads in the data. 2. Print a sorted list of the films which won over 5 Oscars. 3. Create a series of files containing the names of Oscar winning films from each decade (1950-59, 1960-69, 1970-79, etc.). Prizes: EXERCISE 21: Nobel Prize The file nobel.csv contains a list of all the films who have received Nobel 1. Write a Perl script that verifies the existence of the file and then reads in the data. 2. Create a series of files containing the names of Nobel Prize winners for each category (Chemistry, Physics, Medicine, Peace and Literature). 3. Find the number of Nobel Prizes winners from each country. 4. How many Nobel Prize winning scientists were born in Britain?