R for Beginners. Emmanuel Paradis. Institut des Sciences de l Évolution Université Montpellier II F Montpellier cédex 05 France
|
|
|
- Coral White
- 11 years ago
- Views:
Transcription
1 R fr Beginners Emmanuel Paradis Institut des Sciences de l Évlutin Université Mntpellier II F Mntpellier cédex 05 France [email protected]
2 I thank Julien Claude, Christphe Declercq, Éldie Gazave, Friedrich Leisch, Luis Luangkesrn, Françis Pinard, and Mathieu Rs fr their cmments and suggestins n earlier versins f this dcument. I am als grateful t all the members f the R Develpment Cre Team fr their cnsiderable effrts in develping R and animating the discussin list rhelp. Thanks als t the R users whse questins r cmments helped me t write R fr Beginners. Special thanks t Jrge Ahumada fr the Spanish translatin. c 2002, 2005, Emmanuel Paradis (12th September 2005) Permissin is granted t make and distribute cpies, either in part r in full and in any language, f this dcument n any supprt prvided the abve cpyright ntice is included in all cpies. Permissin is granted t translate this dcument, either in part r in full, in any language prvided the abve cpyright ntice is included.
3 Cntents 1 Preamble 1 2 A few cncepts befre starting Hw R wrks Creating, listing and deleting the bjects in memry The n-line help Data with R Objects Reading data in a file Saving data Generating data Regular sequences Randm sequences Manipulating bjects Creating bjects Cnverting bjects Operatrs Accessing the values f an bject: the indexing system Accessing the values f an bject with names The data editr Arithmetics and simple functins Matrix cmputatin Graphics with R Managing graphics Opening several graphical devices Partitining a graphic Graphical functins Lw-level pltting cmmands Graphical parameters A practical example The grid and lattice packages Statistical analyses with R A simple example f analysis f variance Frmulae Generic functins Packages
4 6 Prgramming with R in pratice Lps and vectrizatin Writing a prgram in R Writing yur wn functins Literature n R 71
5 1 Preamble The gal f the present dcument is t give a starting pint fr peple newly interested in R. I chse t emphasize n the understanding f hw R wrks, with the aim f a beginner, rather than expert, use. Given that the pssibilities ffered by R are vast, it is useful t a beginner t get sme ntins and cncepts in rder t prgress easily. I tried t simplify the explanatins as much as I culd t make them understandable by all, while giving useful details, smetimes with tables. R is a system fr statistical analyses and graphics created by Rss Ihaka and Rbert Gentleman 1. R is bth a sftware and a language cnsidered as a dialect f the S language created by the AT&T Bell Labratries. S is available as the sftware S-PLUS cmmercialized by Insightful 2. There are imprtant differences in the designs f R and f S: thse wh want t knw mre n this pint can read the paper by Ihaka & Gentleman (1996) r the R-FAQ 3, a cpy f which is als distributed with R. R is freely distributed under the terms f the GNU General Public Licence 4 ; its develpment and distributin are carried ut by several statisticians knwn as the R Develpment Cre Team. R is available in several frms: the surces (written mainly in C and sme rutines in Frtran), essentially fr Unix and Linux machines, r sme pre-cmpiled binaries fr Windws, Linux, and Macintsh. The files needed t install R, either frm the surces r frm the pre-cmpiled binaries, are distributed frm the internet site f the Cmprehensive R Archive Netwrk (CRAN) 5 where the instructins fr the installatin are als available. Regarding the distributins f Linux (Debian,... ), the binaries are generally available fr the mst recent versins; lk at the CRAN site if necessary. R has many functins fr statistical analyses and graphics; the latter are visualized immediately in their wn windw and can be saved in varius frmats (jpg, png, bmp, ps, pdf, emf, pictex, xfig; the available frmats may depend n the perating system). The results frm a statistical analysis are displayed n the screen, sme intermediate results (P-values, regressin cefficients, residuals,... ) can be saved, written in a file, r used in subsequent analyses. The R language allws the user, fr instance, t prgram lps t successively analyse several data sets. It is als pssible t cmbine in a single prgram different statistical functins t perfrm mre cmplex analyses. The 1 Ihaka R. & Gentleman R R: a language fr data analysis and graphics. Jurnal f Cmputatinal and Graphical Statistics 5: See fr mre infrmatin Fr mre infrmatin:
6 R users may benefit frm a large number f prgrams written fr S and available n the internet 6, mst f these prgrams can be used directly with R. At first, R culd seem t cmplex fr a nn-specialist. This may nt be true actually. In fact, a prminent feature f R is its flexibility. Whereas a classical sftware displays immediately the results f an analysis, R stres these results in an bject, s that an analysis can be dne with n result displayed. The user may be surprised by this, but such a feature is very useful. Indeed, the user can extract nly the part f the results which is f interest. Fr example, if ne runs a series f 20 regressins and wants t cmpare the different regressin cefficients, R can display nly the estimated cefficients: thus the results may take a single line, whereas a classical sftware culd well pen 20 results windws. We will see ther examples illustrating the flexibility f a system such as R cmpared t traditinal sftwares. 6 Fr example: 2
7 2 A few cncepts befre starting Once R is installed n yur cmputer, the sftware is executed by launching the crrespnding executable. The prmpt, by default >, indicates that R is waiting fr yur cmmands. Under Windws using the prgram Rgui.exe, sme cmmands (accessing the n-line help, pening files,... ) can be executed via the pull-dwn menus. At this stage, a new user is likely t wnder What d I d nw? It is indeed very useful t have a few ideas n hw R wrks when it is used fr the first time, and this is what we will see nw. We shall see first briefly hw R wrks. Then, I will describe the assign peratr which allws creating bjects, hw t manage bjects in memry, and finally hw t use the n-line help which is very useful when running R. 2.1 Hw R wrks The fact that R is a language may deter sme users wh think I can t prgram. This shuld nt be the case fr tw reasns. First, R is an interpreted language, nt a cmpiled ne, meaning that all cmmands typed n the keybard are directly executed withut requiring t build a cmplete prgram like in mst cmputer languages (C, Frtran, Pascal,... ). Secnd, R s syntax is very simple and intuitive. Fr instance, a linear regressin can be dne with the cmmand lm(y ~ x) which means fitting a linear mdel with y as respnse and x as predictr. In R, in rder t be executed, a functin always needs t be written with parentheses, even if there is nthing within them (e.g., ls()). If ne just types the name f a functin withut parentheses, R will display the cntent f the functin. In this dcument, the names f the functins are generally written with parentheses in rder t distinguish them frm ther bjects, unless the text indicates clearly s. When R is running, variables, data, functins, results, etc, are stred in the active memry f the cmputer in the frm f bjects which have a name. The user can d actins n these bjects with peratrs (arithmetic, lgical, cmparisn,... ) and functins (which are themselves bjects). The use f peratrs is relatively intuitive, we will see the details later (p. 25). An R functin may be sketched as fllws: arguments ptins functin default arguments = result The arguments can be bjects ( data, frmulae, expressins,... ), sme 3
8 f which culd be defined by default in the functin; these default values may be mdified by the user by specifying ptins. An R functin may require n argument: either all arguments are defined by default (and their values can be mdified with the ptins), r n argument has been defined in the functin. We will see later in mre details hw t use and build functins (p. 67). The present descriptin is sufficient fr the mment t understand hw R wrks. All the actins f R are dne n bjects stred in the active memry f the cmputer: n temprary files are used (Fig. 1). The readings and writings f files are used fr input and utput f data and results (graphics,... ). The user executes the functins via sme cmmands. The results are displayed directly n the screen, stred in an bject, r written n the disk (particularly fr graphics). Since the results are themselves bjects, they can be cnsidered as data and analysed as such. Data files can be read frm the lcal disk r frm a remte server thrugh internet. keybard muse cmmands functins and peratrs.../library/base/ /stast/ /graphics/... library f functins screen data bjects 3 results bjects PS JPEG... data files internet Active memry Hard disk Figure 1: A schematic view f hw R wrks. The functins available t the user are stred in a library lcalised n the disk in a directry called R HOME/library (R HOME is the directry where R is installed). This directry cntains packages f functins, which are themselves structured in directries. The package named base is in a way the cre f R and cntains the basic functins f the language, particularly, fr reading and manipulating data. Each package has a directry called R with a file named like the package (fr instance, fr the package base, this is the file R HOME/library/base/R/base). This file cntains all the functins f the package. One f the simplest cmmands is t type the name f an bject t display its cntent. Fr instance, if an bject n cntents the value 10: > n [1] 10 4
9 The digit 1 within brackets indicates that the display starts at the first element f n. This cmmand is an implicit use f the functin print and the abve example is similar t print(n) (in sme situatins, the functin print must be used explicitly, such as within a functin r a lp). The name f an bject must start with a letter (A Z and a z) and can include letters, digits (0 9), dts (.), and underscres ( ). R discriminates between uppercase letters and lwercase nes in the names f the bjects, s that x and X can name tw distinct bjects (even under Windws). 2.2 Creating, listing and deleting the bjects in memry An bject can be created with the assign peratr which is written as an arrw with a minus sign and a bracket; this symbl can be riented left-t-right r the reverse: > n <- 15 > n [1] 15 > 5 -> n > n [1] 5 > x <- 1 > X <- 10 > x [1] 1 > X [1] 10 If the bject already exists, its previus value is erased (the mdificatin affects nly the bjects in the active memry, nt the data n the disk). The value assigned this way may be the result f an peratin and/r a functin: > n < > n [1] 12 > n <- 3 + rnrm(1) > n [1] The functin rnrm(1) generates a nrmal randm variate with mean zer and variance unity (p. 17). Nte that yu can simply type an expressin withut assigning its value t an bject, the result is thus displayed n the screen but is nt stred in memry: > (10 + 2) * 5 [1] 60 5
10 The assignment will be mitted in the examples if nt necessary fr understanding. The functin ls lists simply the bjects in memry: nly the names f the bjects are displayed. > name <- "Carmen"; n1 <- 10; n2 <- 100; m <- 0.5 > ls() [1] "m" "n1" "n2" "name" Nte the use f the semi-cln t separate distinct cmmands n the same line. If we want t list nly the bjects which cntain a given character in their name, the ptin pattern (which can be abbreviated with pat) can be used: > ls(pat = "m") [1] "m" "name" T restrict the list f bjects whse names start with this character: > ls(pat = "^m") [1] "m" The functin ls.str displays sme details n the bjects in memry: > ls.str() m : num 0.5 n1 : num 10 n2 : num 100 name : chr "Carmen" The ptin pattern can be used in the same way as with ls. Anther useful ptin f ls.str is max.level which specifies the level f detail fr the display f cmpsite bjects. By default, ls.str displays the details f all bjects in memry, included the clumns f data frames, matrices and lists, which can result in a very lng display. We can avid t display all these details with the ptin max.level = -1: > M <- data.frame(n1, n2, m) > ls.str(pat = "M") M : data.frame : 1 bs. f 3 variables: $ n1: num 10 $ n2: num 100 $ m : num 0.5 > ls.str(pat="m", max.level=-1) M : data.frame : 1 bs. f 3 variables: T delete bjects in memry, we use the functin rm: rm(x) deletes the bject x, rm(x,y) deletes bth the bjects x et y, rm(list=ls()) deletes all the bjects in memry; the same ptins mentined fr the functin ls() can then be used t delete selectively sme bjects: rm(list=ls(pat="^m")). 6
11 2.3 The n-line help The n-line help f R gives very useful infrmatin n hw t use the functins. Help is available directly fr a given functin, fr instance: >?lm will display, within R, the help page fr the functin lm() (linear mdel). The cmmands help(lm) and help("lm") have the same effect. The last ne must be used t access help with nn-cnventinal characters: >?* Errr: syntax errr > help("*") Arithmetic package:base R Dcumentatin Arithmetic Operatrs... Calling help pens a page (this depends n the perating system) with general infrmatin n the first line such as the name f the package where is (are) the dcumented functin(s) r peratrs. Then cmes a title fllwed by sectins which give detailed infrmatin. Descriptin: brief descriptin. Usage: fr a functin, gives the name with all its arguments and the pssible ptins (with the crrespnding default values); fr an peratr gives the typical use. Arguments: fr a functin, details each f its arguments. Details: detailed descriptin. Value: if applicable, the type f bject returned by the functin r the peratr. See Als: ther help pages clse r similar t the present ne. Examples: sme examples which can generally be executed withut pening the help with the functin example. Fr beginners, it is gd t lk at the sectin Examples. Generally, it is useful t read carefully the sectin Arguments. Other sectins may be encuntered, such as Nte, References r Authr(s). By default, the functin help nly searches in the packages which are laded in memry. The ptin try.all.packages, which default is FALSE, allws t search in all packages if its value is TRUE: 7
12 > help("bs") N dcumentatin fr bs in specified packages and libraries: yu culd try help.search("bs") > help("bs", try.all.packages = TRUE) Help fr tpic bs is nt in any laded package but can be fund in the fllwing packages: Package splines Library /usr/lib/r/library Nte that in this case the help page f the functin bs is nt displayed. The user can display help pages frm a package nt laded in memry using the ptin package: > help("bs", package = "splines") bs package:splines R Dcumentatin B-Spline Basis fr Plynmial Splines Descriptin:... Generate the B-spline basis matrix fr a plynmial spline. The help in html frmat (read, e.g., with Netscape) is called by typing: > help.start() A search with keywrds is pssible with this html help. The sectin See Als has here hypertext links t ther functin help pages. The search with keywrds is als pssible in R with the functin help.search. The latter lks fr a specified tpic, given as a character string, in the help pages f all installed packages. Fr instance, help.search("tree") will display a list f the functins which help pages mentin tree. Nte that if sme packages have been recently installed, it may be useful t refresh the database used by help.search using the ptin rebuild (e.g., help.search("tree", rebuild = TRUE)). The fnctin aprps finds all functins which name cntains the character string given as argument; nly the packages laded in memry are searched: > aprps(help) [1] "help" ".helpfrcall" "help.search" [4] "help.start" 8
13 3 Data with R 3.1 Objects We have seen that R wrks with bjects which are, f curse, characterized by their names and their cntent, but als by attributes which specify the kind f data represented by an bject. In rder t understand the usefulness f these attributes, cnsider a variable that takes the value 1, 2, r 3: such a variable culd be an integer variable (fr instance, the number f eggs in a nest), r the cding f a categrical variable (fr instance, sex in sme ppulatins f crustaceans: male, female, r hermaphrdite). It is clear that the statistical analysis f this variable will nt be the same in bth cases: with R, the attributes f the bject give the necessary infrmatin. Mre technically, and mre generally, the actin f a functin n an bject depends n the attributes f the latter. All bjects have tw intrinsic attributes: mde and length. The mde is the basic type f the elements f the bject; there are fur main mdes: numeric, character, cmplex 7, and lgical (FALSE r TRUE). Other mdes exist but they d nt represent data, fr instance functin r expressin. The length is the number f elements f the bject. T display the mde and the length f an bject, ne can use the functins mde and length, respectively: > x <- 1 > mde(x) [1] "numeric" > length(x) [1] 1 > A <- "Gmphtherium"; cmpar <- TRUE; z <- 1i > mde(a); mde(cmpar); mde(z) [1] "character" [1] "lgical" [1] "cmplex" Whatever the mde, missing data are represented by NA (nt available). A very large numeric value can be specified with an expnential ntatin: > N <- 2.1e23 > N [1] 2.1e+23 R crrectly represents nn-finite numeric values, such as ± with Inf and -Inf, r values which are nt numbers with NaN (nt a number). 7 The mde cmplex will nt be discussed in this dcument. 9
14 > x <- 5/0 > x [1] Inf > exp(x) [1] Inf > exp(-x) [1] 0 > x - x [1] NaN A value f mde character is input with duble qutes ". It is pssible t include this latter character in the value if it fllws a backslash \. The tw charaters altgether \" will be treated in a specific way by sme functins such as cat fr display n screen, r write.table t write n the disk (p. 14, the ptin qmethd f this functin). > x <- "Duble qutes \" delimitate R s strings." > x [1] "Duble qutes \" delimitate R s strings." > cat(x) Duble qutes " delimitate R s strings. Alternatively, variables f mde character can be delimited with single qutes ( ); in this case it is nt necessary t escape duble qutes with backslashes (but single qutes must be!): > x <- Duble qutes " delimitate R\ s strings. > x [1] "Duble qutes \" delimitate R s strings." The fllwing table gives an verview f the type f bjects representing data. bject mdes several mdes pssible in the same bject? vectr numeric, character, cmplex r lgical N factr numeric r character N array numeric, character, cmplex r lgical N matrix numeric, character, cmplex r lgical N data frame numeric, character, cmplex r lgical Yes ts numeric, character, cmplex r lgical N list numeric, character, cmplex, lgical, Yes functin, expressin,... 10
15 A vectr is a variable in the cmmnly admitted meaning. A factr is a categrical variable. An array is a table with k dimensins, a matrix being a particular case f array with k = 2. Nte that the elements f an array r f a matrix are all f the same mde. A data frame is a table cmpsed with ne r several vectrs and/r factrs all f the same length but pssibly f different mdes. A ts is a time series data set and s cntains additinal attributes such as frequency and dates. Finally, a list can cntain any type f bject, included lists! Fr a vectr, its mde and length are sufficient t describe the data. Fr ther bjects, ther infrmatin is necessary and it is given by nn-intrinsic attributes. Amng these attributes, we can cite dim which crrespnds t the dimensins f an bject. Fr example, a matrix with 2 lines and 2 clumns has fr dim the pair f values [2, 2], but its length is Reading data in a file Fr reading and writing in files, R uses the wrking directry. T find this directry, the cmmand getwd() (get wrking directry) can be used, and the wrking directry can be changed with setwd("c:/data") r setwd("/hme/- paradis/r"). It is necessary t give the path t a file if it is nt in the wrking directry. 8 R can read data stred in text (ASCII) files with the fllwing functins: read.table (which has several variants, see belw), scan and read.fwf. R can als read files in ther frmats (Excel, SAS, SPSS,... ), and access SQLtype databases, but the functins needed fr this are nt in the package base. These functinalities are very useful fr a mre advanced use f R, but we will restrict here t reading files in ASCII frmat. The functin read.table has fr effect t create a data frame, and s is the main way t read data in tabular frm. Fr instance, if ne has a file named data.dat, the cmmand: > mydata <- read.table("data.dat") will create a data frame named mydata, and each variable will be named, by default, V1, V2,... and can be accessed individually by mydata$v1, mydata$v2,..., r by mydata["v1"], mydata["v2"],..., r, still anther slutin, by mydata[, 1], mydata[,2 ],... 9 There are several ptins whse default values (i.e. thse used by R if they are mitted by the user) are detailed in the fllwing table: read.table(file, header = FALSE, sep = "", qute = "\" ", dec = ".", 8 Under Windws, it is useful t create a shrt-cut f Rgui.exe then edit its prperties and change the directry in the field Start in: under the tab Shrt-cut : this directry will then be the wrking directry if R is started frm this shrt-cut. 9 There is a difference: mydata$v1 and mydata[, 1] are vectrs whereas mydata["v1"] is a data frame. We will see later (p. 18) sme details n manipulating bjects. 11
16 rw.names, cl.names, as.is = FALSE, na.strings = "NA", clclasses = NA, nrws = -1, skip = 0, check.names = TRUE, fill =!blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, cmment.char = "#") file the name f the file (within "" r a variable f mde character), pssibly with its path (the symbl \ is nt allwed and must be replaced by /, even under Windws), r a remte access t a file f type URL ( header a lgical (FALSE r TRUE) indicating if the file cntains the names f the variables n its first line sep the field separatr used in the file, fr instance sep="\t" if it is a tabulatin qute the characters used t cite the variables f mde character dec the character used fr the decimal pint rw.names a vectr with the names f the lines which can be either a vectr f mde character, r the number (r the name) f a variable f the file (by default: 1, 2, 3,... ) cl.names a vectr with the names f the variables (by default: V1, V2, V3,... ) as.is cntrls the cnversin f character variables as factrs (if FALSE) r keeps them as characters (TRUE); as.is can be a lgical, numeric r character vectr specifying the variables t be kept as character na.strings the value given t missing data (cnverted as NA) clclasses a vectr f mde character giving the classes t attribute t the clumns nrws the maximum number f lines t read (negative values are ignred) skip the number f lines t be skipped befre reading the data check.names if TRUE, checks that the variable names are valid fr R fill if TRUE and all lines d nt have the same number f variables, blanks are added strip.white (cnditinal t sep) if TRUE, deletes extra spaces befre and after the character variables blank.lines.skip if TRUE, ignres blank lines cmment.char a character defining cmments in the data file, the rest f the line after this character is ignred (t disable this argument, use cmment.char = "") The variants f read.table are useful since they have different default values: read.csv(file, header = TRUE, sep = ",", qute="\"", dec=".", fill = TRUE,...) read.csv2(file, header = TRUE, sep = ";", qute="\"", dec=",", fill = TRUE,...) read.delim(file, header = TRUE, sep = "\t", qute="\"", dec=".", fill = TRUE,...) read.delim2(file, header = TRUE, sep = "\t", qute="\"", dec=",", fill = TRUE,...) 12
17 The functin scan is mre flexible than read.table. A difference is that it is pssible t specify the mde f the variables, fr example: > mydata <- scan("data.dat", what = list("", 0, 0)) reads in the file data.dat three variables, the first is f mde character and the next tw are f mde numeric. Anther imprtant distinctin is that scan() can be used t create different bjects, vectrs, matrices, data frames, lists,... In the abve example, mydata is a list f three vectrs. By default, that is if what is mitted, scan() creates a numeric vectr. If the data read d nt crrespnd t the mde(s) expected (either by default, r specified by what), an errr message is returned. The ptins are the fllwings. scan(file = "", what = duble(0), nmax = -1, n = -1, sep = "", qute = if (sep=="\n") "" else " \"", dec = ".", skip = 0, nlines = 0, na.strings = "NA", flush = FALSE, fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE, cmment.char = "", allwescapes = TRUE) file what nmax n sep qute dec skip nlines na.string flush fill strip.white quiet blank.lines.skip multi.line cmment.char allwescapes the name f the file (within ""), pssibly with its path (the symbl \ is nt allwed and must be replaced by /, even under Windws), r a remte access t a file f type URL ( if file="", the data are entered with the keybard (the entree is terminated by a blank line) specifies the mde(s) f the data (numeric by default) the number f data t read, r, if what is a list, the number f lines t read (by default, scan reads the data up t the end f file) the number f data t read (by default, n limit) the field separatr used in the file the characters used t cite the variables f mde character the character used fr the decimal pint the number f lines t be skipped befre reading the data the number f lines t read the value given t missing data (cnverted as NA) a lgical, if TRUE, scan ges t the next line nce the number f clumns has been reached (allws the user t add cmments in the data file) if TRUE and all lines d nt have the same number f variables, blanks are added (cnditinal t sep) if TRUE, deletes extra spaces befre and after the character variables a lgical, if FALSE, scan displays a line shwing which fields have been read if TRUE, ignres blank lines if what is a list, specifies if the variables f the same individual are n a single line in the file (FALSE) a character defining cmments in the data file, the rest f the line after this character is ignred (the default is t have this disabled) specifies whether C-style escapes (e.g., \t ) be prcessed (the default) r read as verbatim 13
18 The functin read.fwf can be used t read in a file sme data in fixed width frmat: read.fwf(file, widths, header = FALSE, sep = "\t", as.is = FALSE, skip = 0, rw.names, cl.names, n = -1, buffersize = 2000,...) The ptins are the same than fr read.table() except widths which specifies the width f the fields (buffersize is the maximum number f lines read simultaneusly). Fr example, if a file named data.txt has the data indicated n the right, ne can read the data with the fllwing cmmand: > mydata <- read.fwf("data.txt", widths=c(1, 4, 3)) > mydata V1 V2 V3 1 A A B B C C A A B B C C Saving data The functin write.table writes in a file an bject, typically a data frame but this culd well be anther kind f bject (vectr, matrix,... ). The arguments and ptins are: write.table(x, file = "", append = FALSE, qute = TRUE, sep = " ", el = "\n", na = "NA", dec = ".", rw.names = TRUE, cl.names = TRUE, qmethd = c("escape", "duble")) x file append qute sep el na dec rw.names cl.names qmethd the name f the bject t be written the name f the file (by default the bject is displayed n the screen) if TRUE adds the data withut erasing thse pssibly existing in the file a lgical r a numeric vectr: if TRUE the variables f mde character and the factrs are written within "", therwise the numeric vectr indicates the numbers f the variables t write within "" (in bth cases the names f the variables are written within "" but nt if qute = FALSE) the field separatr used in the file the character t be used at the end f each line ("\n" is a carriage-return) the character t be used fr missing data the character used fr the decimal pint a lgical indicating whether the names f the lines are written in the file id. fr the names f the clumns specifies, if qute=true, hw duble qutes " included in variables f mde character are treated: if "escape" (r "e", the default) each " is replaced by \", if "d" each " is replaced by "" 14
19 T write in a simpler way an bject in a file, the cmmand write(x, file="data.txt") can be used, where x is the name f the bject (which can be a vectr, a matrix, r an array). There are tw ptins: nc (r ncl) which defines the number f clumns in the file (by default nc=1 if x is f mde character, nc=5 fr the ther mdes), and append (a lgical) t add the data withut deleting thse pssibly already in the file (TRUE) r deleting them if the file already exists (FALSE, the default). T recrd a grup f bjects f any type, we can use the cmmand save(x, y, z, file= "xyz.rdata"). T ease the transfert f data between different machines, the ptin ascii = TRUE can be used. The data (which are nw called a wrkspace in R s jargn) can be laded later in memry with lad("xyz.rdata"). The functin save.image() is a shrt-cut fr save(list =ls(all=true), file=".rdata"). 3.4 Generating data Regular sequences A regular sequence f integers, fr example frm 1 t 30, can be generated with: > x <- 1:30 The resulting vectr x has 30 elements. The peratr : has pririty n the arithmetic peratrs within an expressin: > 1:10-1 [1] > 1:(10-1) [1] The functin seq can generate sequences f real numbers as fllws: > seq(1, 5, 0.5) [1] where the first number indicates the beginning f the sequence, the secnd ne the end, and the third ne the increment t be used t generate the sequence. One can use als: > seq(length=9, frm=1, t=5) [1] One can als type directly the values using the functin c: > c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5) [1]
20 It is als pssible, if ne wants t enter sme data n the keybard, t use the functin scan with simply the default ptins: > z <- scan() 1: : Read 9 items > z [1] The functin rep creates a vectr with all its elements identical: > rep(1, 30) [1] The functin sequence creates a series f sequences f integers each ending by the numbers given as arguments: > sequence(4:5) [1] > sequence(c(10,5)) [1] The functin gl (generate levels) is very useful because it generates regular series f factrs. The usage f this fnctin is gl(k, n) where k is the number f levels (r classes), and n is the number f replicatins in each level. Tw ptins may be used: length t specify the number f data prduced, and labels t specify the names f the levels f the factr. Examples: > gl(3, 5) [1] Levels: > gl(3, 5, length=30) [1] Levels: > gl(2, 6, label=c("male", "Female")) [1] Male Male Male Male Male Male [7] Female Female Female Female Female Female Levels: Male Female > gl(2, 10) [1] Levels: 1 2 > gl(2, 1, length=20) [1] Levels: 1 2 > gl(2, 2, length=20) [1] Levels:
21 Finally, expand.grid() creates a data frame with all cmbinatins f vectrs r factrs given as arguments: > expand.grid(h=c(60,80), w=c(100, 300), sex=c("male", "Female")) h w sex Male Male Male Male Female Female Female Female Randm sequences law functin Gaussian (nrmal) rnrm(n, mean=0, sd=1) expnential rexp(n, rate=1) gamma rgamma(n, shape, scale=1) Pissn rpis(n, lambda) Weibull rweibull(n, shape, scale=1) Cauchy rcauchy(n, lcatin=0, scale=1) beta rbeta(n, shape1, shape2) Student (t) rt(n, df) Fisher Snedecr (F ) rf(n, df1, df2) Pearsn (χ 2 ) rchisq(n, df) binmial rbinm(n, size, prb) multinmial rmultinm(n, size, prb) gemetric rgem(n, prb) hypergemetric rhyper(nn, m, n, k) lgistic rlgis(n, lcatin=0, scale=1) lgnrmal rlnrm(n, meanlg=0, sdlg=1) negative binmial rnbinm(n, size, prb) unifrm runif(n, min=0, max=1) Wilcxn s statistics rwilcx(nn, m, n), rsignrank(nn, n) It is useful in statistics t be able t generate randm data, and R can d it fr a large number f prbability density functins. These functins are f the frm rfunc(n, p1, p2,...), where func indicates the prbability distributin, n the number f data generated, and p1, p2,... are the values f the parameters f the distributin. The abve table gives the details fr each distributin, and the pssible default values (if nne default value is indicated, this means that the parameter must be specified by the user). Mst f these functins have cunterparts btained by replacing the letter r with d, p r q t get, respectively, the prbability density (dfunc(x,...)), 17
22 the cumulative prbability density (pfunc(x,...)), and the value f quantile (qfunc(p,...), with 0 < p < 1). The last tw series f functins can be used t find critical values r P -values f statistical tests. Fr instance, the critical values fr a tw-tailed test fllwing a nrmal distributin at the 5% threshld are: > qnrm(0.025) [1] > qnrm(0.975) [1] Fr the ne-tailed versin f the same test, either qnrm(0.05) r 1 - qnrm(0.95) will be used depending n the frm f the alternative hypthesis. The P -value f a test, say χ 2 = 3.84 with df = 1, is: > 1 - pchisq(3.84, 1) [1] Manipulating bjects Creating bjects We have seen previusly different ways t create bjects using the assign peratr; the mde and the type f bjects s created are generally determined implicitly. It is pssible t create an bject and specifying its mde, length, type, etc. This apprach is interesting in the perspective f manipulating bjects. One can, fr instance, create an empty bject and then mdify its elements successively which is mre efficient than putting all its elements tgether with c(). The indexing system culd be used here, as we will see later (p. 26). It can als be very cnvenient t create bjects frm thers. Fr example, if ne wants t fit a series f mdels, it is simple t put the frmulae in a list, and then t extract the elements successively t insert them in the functin lm. At this stage f ur learning f R, the interest in learning the fllwing functinalities is nt nly practical but als didactic. The explicit cnstructin f bjects gives a better understanding f their structure, and allws us t g further in sme ntins previusly mentined. Vectr. The functin vectr, which has tw arguments mde and length, creates a vectr which elements have a value depending n the mde specified as argument: 0 if numeric, FALSE if lgical, r "" if character. The fllwing functins have exactly the same effect and have fr single argument the length f the vectr: numeric(), lgical(), and character(). 18
23 Factr. A factr includes nt nly the values f the crrespnding categrical variable, but als the different pssible levels f that variable (even if they are nt present in the data). The functin factr creates a factr with the fllwing ptins: factr(x, levels = srt(unique(x), na.last = TRUE), labels = levels, exclude = NA, rdered = is.rdered(x)) levels specifies the pssible levels f the factr (by default the unique values f the vectr x), labels defines the names f the levels, exclude the values f x t exclude frm the levels, and rdered is a lgical argument specifying whether the levels f the factr are rdered. Recall that x is f mde numeric r character. Sme examples fllw. > factr(1:3) [1] Levels: > factr(1:3, levels=1:5) [1] Levels: > factr(1:3, labels=c("a", "B", "C")) [1] A B C Levels: A B C > factr(1:5, exclude=4) [1] NA 5 Levels: The functin levels extracts the pssible levels f a factr: > ff <- factr(c(2, 4), levels=2:5) > ff [1] 2 4 Levels: > levels(ff) [1] "2" "3" "4" "5" Matrix. A matrix is actually a vectr with an additinal attribute (dim) which is itself a numeric vectr with length 2, and defines the numbers f rws and clumns f the matrix. A matrix can be created with the functin matrix: matrix(data = NA, nrw = 1, ncl = 1, byrw = FALSE, dimnames = NULL) 19
24 The ptin byrw indicates whether the values given by data must fill successively the clumns (the default) r the rws (if TRUE). The ptin dimnames allws t give names t the rws and clumns. > matrix(data=5, nr=2, nc=2) [,1] [,2] [1,] 5 5 [2,] 5 5 > matrix(1:6, 2, 3) [,1] [,2] [,3] [1,] [2,] > matrix(1:6, 2, 3, byrw=true) [,1] [,2] [,3] [1,] [2,] Anther way t create a matrix is t give the apprpriate values t the dim attribute (which is initially NULL): > x <- 1:15 > x [1] > dim(x) NULL > dim(x) <- c(5, 3) > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] Data frame. We have seen that a data frame is created implicitly by the functin read.table; it is als pssible t create a data frame with the functin data.frame. The vectrs s included in the data frame must be f the same length, r if ne f the them is shrter, it is recycled a whle number f times: > x <- 1:4; n <- 10; M <- c(10, 35); y <- 2:4 > data.frame(x, n) x n
25 > data.frame(x, M) x M > data.frame(x, y) Errr in data.frame(x, y) : arguments imply differing number f rws: 4, 3 If a factr is included in a data frame, it must be f the same length than the vectr(s). It is pssible t change the names f the clumns with, fr instance, data.frame(a1=x, A2=n). One can als give names t the rws with the ptin rw.names which must be, f curse, a vectr f mde character and f length equal t the number f lines f the data frame. Finally, nte that data frames have an attribute dim similarly t matrices. List. A list is created in a way similar t data frames with the functin list. There is n cnstraint n the bjects that can be included. In cntrast t data.frame(), the names f the bjects are nt taken by default; taking the vectrs x and y f the previus example: > L1 <- list(x, y); L2 <- list(a=x, B=y) > L1 [[1]] [1] [[2]] [1] > L2 $A [1] $B [1] > names(l1) NULL > names(l2) [1] "A" "B" Time-series. The functin ts creates an bject f class "ts" frm a vectr (single time-series) r a matrix (multivariate time-series), and sme p- 21
26 tins which characterize the series. The ptins, with the default values, are: ts(data = NA, start = 1, end = numeric(0), frequency = 1, deltat = 1, ts.eps = getoptin("ts.eps"), class, names) data start end frequency deltat ts.eps class names a vectr r a matrix the time f the first bservatin, either a number, r a vectr f tw integers (see the examples belw) the time f the last bservatin specified in the same way than start the number f bservatins per time unit the fractin f the sampling perid between successive bservatins (ex. 1/12 fr mnthly data); nly ne f frequency r deltat must be given tlerance fr the cmparisn f series. The frequencies are cnsidered equal if their difference is less than ts.eps class t give t the bject; the default is "ts" fr a single series, and c("mts", "ts") fr a multivariate series a vectr f mde character with the names f the individual series in the case f a multivariate series; by default the names f the clumns f data, r Series 1, Series 2,... A few examples f time-series created with ts: > ts(1:10, start = 1959) Time Series: Start = 1959 End = 1968 Frequency = 1 [1] > ts(1:47, frequency = 12, start = c(1959, 2)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nv Dec > ts(1:10, frequency = 4, start = c(1959, 2)) Qtr1 Qtr2 Qtr3 Qtr > ts(matrix(rpis(36, 5), 12, 3), start=c(1961, 1), frequency=12) Series 1 Series 2 Series 3 22
27 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nv Dec Expressin. The bjects f mde expressin have a fundamental rle in R. An expressin is a series f characters which makes sense fr R. All valid cmmands are expressins. When a cmmand is typed directly n the keybard, it is then evaluated by R and executed if it is valid. In many circumstances, it is useful t cnstruct an expressin withut evaluating it: this is what the functin expressin is made fr. It is, f curse, pssible t evaluate the expressin subsequently with eval(). > x <- 3; y <- 2.5; z <- 1 > exp1 <- expressin(x / (y + exp(z))) > exp1 expressin(x/(y + exp(z))) > eval(exp1) [1] Expressins can be used, amng ther things, t include equatins in graphs (p. 42). An expressin can be created frm a variable f mde character. Sme functins take expressins as arguments, fr example D which returns partial derivatives: > D(exp1, "x") 1/(y + exp(z)) > D(exp1, "y") -x/(y + exp(z))^2 > D(exp1, "z") -x * exp(z)/(y + exp(z))^ Cnverting bjects The reader has surely realized that the differences between sme types f bjects are small; it is thus lgical that it is pssible t cnvert an bject frm a type t anther by changing sme f its attributes. Such a cnversin will be dne with a functin f the type as.smething. R (versin 2.1.0) has, in the 23
28 packages base and utils, 98 f such functins, s we will nt g in the deepest details here. The result f a cnversin depends bviusly f the attributes f the cnverted bject. Genrally, cnversin fllws intuitive rules. Fr the cnversin f mdes, the fllwing table summarizes the situatin. Cnversin t Functin Rules numeric as.numeric FALSE 0 TRUE 1 "1", "2",... 1, 2,... "A",... NA lgical as.lgical 0 FALSE ther numbers TRUE "FALSE", "F" FALSE "TRUE", "T" TRUE ther characters NA character as.character 1, 2,... "1", "2",... FALSE "FALSE" TRUE "TRUE" There are functins t cnvert the types f bjects (as.matrix, as.ts, as.data.frame, as.expressin,... ). These functins will affect attributes ther than the mdes during the cnversin. The results are, again, generally intuitive. A situatin frequently encuntered is the cnversin f factrs int numeric values. In this case, R des the cnversin with the numeric cding f the levels f the factr: > fac <- factr(c(1, 10)) > fac [1] 1 10 Levels: 1 10 > as.numeric(fac) [1] 1 2 This makes sense when cnsidering a factr f mde character: > fac2 <- factr(c("male", "Female")) > fac2 [1] Male Female Levels: Female Male > as.numeric(fac2) [1] 2 1 Nte that the result is nt NA as may have been expected frm the table abve. 24
29 T cnvert a factr f mde numeric int a numeric vectr but keeping the levels as they are riginally specified, ne must first cnvert int character, then int numeric. > as.numeric(as.character(fac)) [1] 1 10 This prcedure is very useful if in a file a numeric variable has als nnnumeric values. We have seen that read.table() in such a situatin will, by default, read this clumn as a factr Operatrs We have seen previusly that there are three main types f peratrs in R 10. Here is the list. Operatrs Arithmetic Cmparisn Lgical + additin < lesser than! x lgical NOT - subtractin > greater than x & y lgical AND * multiplicatin <= lesser than r equal t x && y id. / divisin >= greater than r equal t x y lgical OR ^ pwer == equal x y id. %% mdul!= different xr(x, y) exclusive OR %/% integer divisin The arithmetic and cmparisn peratrs act n tw elements (x + y, a < b). The arithmetic peratrs act nt nly n variables f mde numeric r cmplex, but als n lgical variables; in this latter case, the lgical values are cerced int numeric. The cmparisn peratrs may be applied t any mde: they return ne r several lgical values. The lgical peratrs are applied t ne (!) r tw bjects f mde lgical, and return ne (r several) lgical values. The peratrs AND and OR exist in tw frms: the single ne perates n each elements f the bjects and returns as many lgical values as cmparisns dne; the duble ne perates n the first element f the bjects. It is necessary t use the peratr AND t specify an inequality f the type 0 < x < 1 which will be cded with: 0 < x & x < 1. The expressin 0 < x < 1 is valid, but will nt return the expected result: since bth peratrs are the same, they are executed successively frm left t right. The cmparisn 0 < x is first dne and returns a lgical value which is then cmpared t 1 (TRUE r FALSE < 1): in this situatin, the lgical value is implicitly cerced int numeric (1 r 0 < 1). 10 The fllwing characters are als peratrs fr R: [, [[, :,?, <-, <<-, =, ::. A table f peratrs describing precedence rules can be fund with?syntax. 25
30 > x <- 0.5 > 0 < x < 1 [1] FALSE The cmparisn peratrs perate n each element f the tw bjects being cmpared (recycling the values f the shrtest ne if necessary), and thus returns an bject f the same size. T cmpare whlly tw bjects, tw functins are available: identical and all.equal. > x <- 1:3; y <- 1:3 > x == y [1] TRUE TRUE TRUE > identical(x, y) [1] TRUE > all.equal(x, y) [1] TRUE identical cmpares the internal representatin f the data and returns TRUE if the bjects are strictly identical, and FALSE therwise. all.equal cmpares the near equality f tw bjects, and returns TRUE r display a summary f the differences. The latter functin takes the apprximatin f the cmputing prcess int accunt when cmparing numeric values. The cmparisn f numeric values n a cmputer is smetimes surprising! > 0.9 == (1-0.1) [1] TRUE > identical(0.9, 1-0.1) [1] TRUE > all.equal(0.9, 1-0.1) [1] TRUE > 0.9 == ( ) [1] FALSE > identical(0.9, ) [1] FALSE > all.equal(0.9, ) [1] TRUE > all.equal(0.9, , tlerance = 1e-16) [1] "Mean relative difference: e-16" Accessing the values f an bject: the indexing system The indexing system is an efficient and flexible way t access selectively the elements f an bject; it can be either numeric r lgical. T access, fr example, the third value f a vectr x, we just type x[3] which can be used either t extract r t change this value: > x <- 1:5 26
31 > x[3] [1] 3 > x[3] <- 20 > x [1] The index itself can be a vectr f mde numeric: > i <- c(1, 3) > x[i] [1] 1 20 If x is a matrix r a data frame, the value f the ith line and j th clumn is accessed with x[i, j]. T access all values f a given rw r clumn, ne has simply t mit the apprpriate index (withut frgetting the cmma!): > x <- matrix(1:6, 2, 3) > x [,1] [,2] [,3] [1,] [2,] > x[, 3] <- 21:22 > x [,1] [,2] [,3] [1,] [2,] > x[, 3] [1] Yu have certainly nticed that the last result is a vectr and nt a matrix. The default behaviur f R is t return an bject f the lwest dimensin pssible. This can be altered with the ptin drp which default is TRUE: > x[, 3, drp = FALSE] [,1] [1,] 21 [2,] 22 This indexing system is easily generalized t arrays, with as many indices as the number f dimensins f the array (fr example, a three dimensinal array: x[i, j, k], x[,, 3], x[,, 3, drp = FALSE], and s n). It may be useful t keep in mind that indexing is made with square brackets, while parentheses are used fr the arguments f a functin: > x(1) Errr: culdn t find functin "x" 27
32 Indexing can als be used t suppress ne r several rws r clumns using negative values. Fr example, x[-1, ] will suppress the first rw, while x[-c(1, 15), ] will d the same fr the 1st and 15th rws. Using the matrix defined abve: > x[, -1] [,1] [,2] [1,] 3 21 [2,] 4 22 > x[, -(1:2)] [1] > x[, -(1:2), drp = FALSE] [,1] [1,] 21 [2,] 22 Fr vectrs, matrices and arrays, it is pssible t access the values f an element with a cmparisn expressin as the index: > x <- 1:10 > x[x >= 5] <- 20 > x [1] > x[x == 1] <- 25 > x [1] A practical use f the lgical indexing is, fr instance, the pssibility t select the even elements f an integer variable: > x <- rpis(40, lambda=5) > x [1] [21] > x[x %% 2 == 0] [1] Thus, this indexing system uses the lgical values returned, in the abve examples, by cmparisn peratrs. These lgical values can be cmputed befrehand, they then will be recycled if necessary: > x <- 1:40 > s <- c(false, TRUE) > x[s] [1]
33 Lgical indexing can als be used with data frames, but with cautin since different clumns f the data drame may be f different mdes. Fr lists, accessing the different elements (which can be any kind f bject) is dne either with single r with duble square brackets: the difference is that with single brackets a list is returned, whereas duble brackets extract the bject frm the list. The result can then be itself indexed as previusly seen fr vectrs, matrices, etc. Fr instance, if the third bject f a list is a vectr, its ith value can be accessed using my.list[[3]][i], if it is a three dimensinal array using my.list[[3]][i, j, k], and s n. Anther difference is that my.list[1:2] will return a list with the first and secnd elements f the riginal list, whereas my.list[[1:2]] will n nt give the expected result Accessing the values f an bject with names The names are labels f the elements f an bject, and thus f mde character. They are generally ptinal attributes. There are several kinds f names (names, clnames, rwnames, dimnames). The names f a vectr are stred in a vectr f the same length f the bject, and can be accessed with the functin names. > x <- 1:3 > names(x) NULL > names(x) <- c("a", "b", "c") > x a b c > names(x) [1] "a" "b" "c" > names(x) <- NULL > x [1] Fr matrices and data frames, clnames and rwnames are labels f the clumns and rws, respectively. They can be accessed either with their respective functins, r with dimnames which returns a list with bth vectrs. > X <- matrix(1:4, 2) > rwnames(x) <- c("a", "b") > clnames(x) <- c("c", "d") > X c d a 1 3 b 2 4 > dimnames(x) [[1]] [1] "a" "b" 29
34 [[2]] [1] "c" "d" Fr arrays, the names f the dimensins can be accessed with dimnames: > A <- array(1:8, dim = c(2, 2, 2)) > A,, 1 [,1] [,2] [1,] 1 3 [2,] 2 4,, 2 [,1] [,2] [1,] 5 7 [2,] 6 8 > dimnames(a) <- list(c("a", "b"), c("c", "d"), c("e", "f")) > A,, e c d a 1 3 b 2 4,, f c d a 5 7 b 6 8 If the elements f an bject have names, they can be extracted by using them as indices. Actually, this shuld be termed subsetting rather than extractin since the attributes f the riginal bject are kept. Fr instance, if a data frame DF cntains the variables x, y, and z, the cmmand DF["x"] will return a data frame with just x; DF[c("x", "y")] will return a data frame with bth variables. This wrks with lists as well if the elements in the list have names. As the reader surely realizes, the index used here is a vectr f mde character. Like the numeric r lgical vectrs seen abve, this vectr can be defined befrehand and then used fr the extractin. T extract a vectr r a factr frm a data frame, n can use the peratr $ (e.g., DF$x). This als wrks with lists. 30
35 3.5.6 The data editr It is pssible t use a graphical spreadsheet-like editr t edit a data bject. Fr example, if X is a matrix, the cmmand data.entry(x) will pen a graphic editr and ne will be able t mdify sme values by clicking n the apprpriate cells, r t add new clumns r rws. The functin data.entry mdifies directly the bject given as argument withut needing t assign its result. On the ther hand, the functin de returns a list with the bjects given as arguments and pssibly mdified. This result is displayed n the screen by default, but, as fr mst functins, can be assigned t an bject. The details f using the data editr depend n the perating system Arithmetics and simple functins There are numerus functins in R t manipulate data. We have already seen the simplest ne, c which cncatenates the bjects listed in parentheses. Fr example: > c(1:5, seq(10, 11, 0.2)) [1] Vectrs can be manipulated with classical arithmetic expressins: > x <- 1:4 > y <- rep(1, 4) > z <- x + y > z [1] Vectrs f different lengths can be added; in this case, the shrtest vectr is recycled. Examples: > x <- 1:4 > y <- 1:2 > z <- x + y > z [1] > x <- 1:3 > y <- 1:2 > z <- x + y Warning message: lnger bject length is nt a multiple f shrter bject length in: x + y > z [1]
36 Nte that R returned a warning message and nt an errr message, thus the peratin has been perfrmed. If we want t add (r multiply) the same value t all the elements f a vectr: > x <- 1:4 > a <- 10 > z <- a * x > z [1] The functins available in R fr manipulating data are t many t be listed here. One can find all the basic mathematical functins (lg, exp, lg10, lg2, sin, cs, tan, asin, acs, atan, abs, sqrt,... ), special functins (gamma, digamma, beta, besseli,... ), as well as diverse functins useful in statistics. Sme f these functins are listed in the fllwing table. sum(x) prd(x) max(x) min(x) which.max(x) which.min(x) range(x) length(x) mean(x) median(x) var(x) r cv(x) cr(x) var(x, y) r cv(x, y) cr(x, y) sum f the elements f x prduct f the elements f x maximum f the elements f x minimum f the elements f x returns the index f the greatest element f x returns the index f the smallest element f x id. than c(min(x), max(x)) number f elements in x mean f the elements f x median f the elements f x variance f the elements f x (calculated n n 1); if x is a matrix r a data frame, the variance-cvariance matrix is calculated crrelatin matrix f x if it is a matrix r a data frame (1 if x is a vectr) cvariance between x and y, r between the clumns f x and thse f y if they are matrices r data frames linear crrelatin between x and y, r crrelatin matrix if they are matrices r data frames These functins return a single value (thus a vectr f length ne), except range which returns a vectr f length tw, and var, cv, and cr which may return a matrix. The fllwing functins return mre cmplex results. rund(x, n) rev(x) srt(x) rank(x) runds the elements f x t n decimals reverses the elements f x srts the elements f x in increasing rder; t srt in decreasing rder: rev(srt(x)) ranks f the elements f x 32
37 lg(x, base) cmputes the lgarithm f x with base base scale(x) if x is a matrix, centers and reduces the data; t center nly use the ptin center=false, t reduce nly scale=false (by default center=true, scale=true) pmin(x,y,...) a vectr which ith element is the minimum f x[i], y[i],... pmax(x,y,...) id. fr the maximum cumsum(x) a vectr which ith element is the sum frm x[1] t x[i] cumprd(x) id. fr the prduct cummin(x) id. fr the minimum cummax(x) id. fr the maximum match(x, y) returns a vectr f the same length than x with the elements f x which are in y (NA therwise) which(x == a) returns a vectr f the indices f x if the cmparisn peratin is true (TRUE), in this example the values f i fr which x[i] == a (the argument f this functin must be a variable f mde lgical) chse(n, k) cmputes the cmbinatins f k events amng n repetitins = n!/[(n k)!k!] na.mit(x) suppresses the bservatins with missing data (NA) (suppresses the crrespnding line if x is a matrix r a data frame) na.fail(x) returns an errr message if x cntains at least ne NA unique(x) if x is a vectr r a data frame, returns a similar bject but with the duplicate elements suppressed table(x) returns a table with the numbers f the differents values f x (typically fr integers r factrs) table(x, y) cntingency table f x and y subset(x,...) returns a selectin f x with respect t criteria (..., typically cmparisns: x$v1 < 10); if x is a data frame, the ptin select gives the variables t be kept (r drpped using a minus sign) sample(x, size) resample randmly and withut replacement size elements in the vectr x, the ptin replace = TRUE allws t resample with replacement Matrix cmputatin R has facilities fr matrix cmputatin and manipulatin. The functins rbind and cbind bind matrices with respect t the lines r the clumns, respectively: > m1 <- matrix(1, nr = 2, nc = 2) > m2 <- matrix(2, nr = 2, nc = 2) > rbind(m1, m2) [,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 2 2 [4,] 2 2 > cbind(m1, m2) [,1] [,2] [,3] [,4] 33
38 [1,] [2,] The peratr fr the prduct f tw matrices is %*%. Fr example, cnsidering the tw matrices m1 and m2 abve: > rbind(m1, m2) %*% cbind(m1, m2) [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] > cbind(m1, m2) %*% rbind(m1, m2) [,1] [,2] [1,] [2,] The transpsitin f a matrix is dne with the functin t; this functin wrks als with a data frame. The functin diag can be used t extract r mdify the diagnal f a matrix, r t build a diagnal matrix. > diag(m1) [1] 1 1 > diag(rbind(m1, m2) %*% cbind(m1, m2)) [1] > diag(m1) <- 10 > m1 [,1] [,2] [1,] 10 1 [2,] 1 10 > diag(3) [,1] [,2] [,3] [1,] [2,] [3,] > v <- c(10, 20, 30) > diag(v) [,1] [,2] [,3] [1,] [2,] [3,] > diag(2.1, nr = 3, nc = 5) [,1] [,2] [,3] [,4] [,5] [1,] [2,] [3,]
39 R has als sme special functins fr matrix cmputatin. We can cite here slve fr inverting a matrix, qr fr decmpsitin, eigen fr cmputing eigenvalues and eigenvectrs, and svd fr singular value decmpsitin. 35
40 4 Graphics with R R ffers a remarkable variety f graphics. T get an idea, ne can type dem(graphics) r dem(persp). It is nt pssible t detail here the pssibilities f R in terms f graphics, particularly since each graphical functin has a large number f ptins making the prductin f graphics very flexible. The way graphical functins wrk deviates substantially frm the scheme sketched at the beginning f this dcument. Particularly, the result f a graphical functin cannt be assigned t an bject 11 but is sent t a graphical device. A graphical device is a graphical windw r a file. There are tw kinds f graphical functins: the high-level pltting functins which create a new graph, and the lw-level pltting functins which add elements t an existing graph. The graphs are prduced with respect t graphical parameters which are defined by default and can be mdified with the functin par. We will see in a first time hw t manage graphics and graphical devices; we will then smehw detail the graphical functins and parameters. We will see a practical example f the use f these functinalities in prducing graphs. Finally, we will see the packages grid and lattice whse functining is different frm the ne summarized abve. 4.1 Managing graphics Opening several graphical devices When a graphical functin is executed, if n graphical device is pen, R pens a graphical windw and displays the graph. A graphical device may be pen with an apprpriate functin. The list f available graphical devices depends n the perating system. The graphical windws are called X11 under Unix/Linux and windws under Windws. In all cases, ne can pen a graphical windw with the cmmand x11() which als wrks under Windws because f an alias twards the cmmand windws(). A graphical device which is a file will be pen with a functin depending n the frmat: pstscript(), pdf(), png(),... The list f available graphical devices can be fund with?device. The last pen device becmes the active graphical device n which all subsequent graphs are displayed. The functin dev.list() displays the list f pen devices: > x11(); x11(); pdf() > dev.list() 11 There are a few remarkable exceptins: hist() and barplt() prduce als numeric results as lists r matrices. 36
41 X11 X11 pdf The figures displayed are the device numbers which must be used t change the active device. T knw what is the active device: > dev.cur() pdf 4 and t change the active device: > dev.set(3) X11 3 The functin dev.ff() clses a device: by default the active device is clsed, therwise this is the ne which number is given as argument t the functin. R then displays the number f the new active device: > dev.ff(2) X11 3 > dev.ff() pdf 4 Tw specific features f the Windws versin f R are wrth mentining: a Windws Metafile device can be pen with the functin win.metafile, and a menu Histry displayed when the graphical windw is selected allwing recrding f all graphs drawn during a sessin (by default, the recrding system is ff, the user switches it n by clicking n Recrding in this menu) Partitining a graphic The functin split.screen partitins the active graphical device. Fr example: > split.screen(c(1, 2)) divides the device int tw parts which can be selected with screen(1) r screen(2); erase.screen() deletes the last drawn graph. A part f the device can itself be divided with split.screen() leading t the pssibility t make cmplex arrangements. These functins are incmpatible with thers (such as layut r cplt) and must nt be used with multiple graphical devices. Their use shuld be limited, fr instance, t graphical explratin f data. The functin layut partitins the active graphic windw in several parts where the graphs will be displayed successively. Its main argument is a matrix with integer numbers indicating the numbers f the sub-windws. Fr example, t divide the device int fur equal parts: 37
42 > layut(matrix(1:4, 2, 2)) It is f curse pssible t create this matrix previusly allwing t better visualize hw the device is divided: > mat <- matrix(1:4, 2, 2) > mat [,1] [,2] [1,] 1 3 [2,] 2 4 > layut(mat) T actually visualize the partitin created, ne can use the functin layut.shw with the number f sub-windws as argument (here 4). In this example, we will have: > layut.shw(4) The fllwing examples shw sme f the pssibilities ffered by layut(). > layut(matrix(1:6, 3, 2)) > layut.shw(6) > layut(matrix(1:6, 2, 3)) > layut.shw(6) > m <- matrix(c(1:3, 3), 2, 2) > layut(m) > layut.shw(3) In all these examples, we have nt used the ptin byrw f matrix(), the sub-windws are thus numbered clumn-wise; ne can just specify matrix(..., byrw=true) s that the sub-windws are numbered rw-wise. The numbers 38
43 in the matrix may als be given in any rder, fr example, matrix(c(2, 1, 4, 3), 2, 2). By default, layut() partitins the device with regular heights and widths: this can be mdified with the ptins widths and heights. These dimensins are given relatively 12. Examples: > m <- matrix(1:4, 2, 2) > layut(m, widths=c(1, 3), heights=c(3, 1)) > layut.shw(4) > m <- matrix(c(1,1,2,1),2,2) > layut(m, widths=c(2, 1), heights=c(1, 2)) > layut.shw(2) 1 2 Finally, the numbers in the matrix can include zers giving the pssibility t make cmplex (r even esterical) partitins. > m <- matrix(0:3, 2, 2) > layut(m, c(1, 3), c(1, 3)) > layut.shw(3) > m <- matrix(scan(), 5, 5) 1: : : : Read 25 items > layut(m) > layut.shw(5) They can be given in centimetres, see?layut. 39
44 4.2 Graphical functins Here is an verview f the high-level graphical functins in R. plt(x) plt(x, y) sunflwerplt(x, y) pie(x) bxplt(x) stripchart(x) cplt(x~y z) interactin.plt (f1, f2, y) matplt(x,y) dtchart(x) furfldplt(x) asscplt(x) msaicplt(x) pairs(x) plt.ts(x) ts.plt(x) hist(x) barplt(x) qqnrm(x) qqplt(x, y) cntur(x, y, z) filled.cntur (x, y, z) image(x, y, z) persp(x, y, z) stars(x) plt f the values f x (n the y-axis) rdered n the x-axis bivariate plt f x (n the x-axis) and y (n the y-axis) id. but the pints with similar crdinates are drawn as a flwer which petal number represents the number f pints circular pie-chart bx-and-whiskers plt plt f the values f x n a line (an alternative t bxplt() fr small sample sizes) bivariate plt f x and y fr each value (r interval f values) f z if f1 and f2 are factrs, plts the means f y (n the y-axis) with respect t the values f f1 (n the x-axis) and f f2 (different curves); the ptin fun allws t chse the summary statistic f y (by default fun=mean) bivariate plt f the first clumn f x vs. the first ne f y, the secnd ne f x vs. the secnd ne f y, etc. if x is a data frame, plts a Cleveland dt plt (stacked plts line-by-line and clumn-by-clumn) visualizes, with quarters f circles, the assciatin between tw dichtmus variables fr different ppulatins (x must be an array with dim=c(2, 2, k), r a matrix with dim=c(2, 2) if k = 1) Chen Friendly graph shwing the deviatins frm independence f rws and clumns in a tw dimensinal cntingency table msaic graph f the residuals frm a lg-linear regressin f a cntingency table if x is a matrix r a data frame, draws all pssible bivariate plts between the clumns f x if x is an bject f class "ts", plt f x with respect t time, x may be multivariate but the series must have the same frequency and dates id. but if x is multivariate the series may have different dates and must have the same frequency histgram f the frequencies f x histgram f the values f x quantiles f x with respect t the values expected under a nrmal law quantiles f y with respect t the quantiles f x cntur plt (data are interplated t draw the curves), x and y must be vectrs and z must be a matrix s that dim(z)=c(length(x), length(y)) (x and y may be mitted) id. but the areas between the cnturs are clured, and a legend f the clurs is drawn as well id. but the actual data are represented with clurs id. but in perspective if x is a matrix r a data frame, draws a graph with segments r a star where each rw f x is represented by a star and the clumns are the lengths f the segments 40
45 symbls(x, y,...) termplt(md.bj) draws, at the crdinates given by x and y, symbls (circles, squares, rectangles, stars, thermmetres r bxplts ) which sizes, clurs, etc, are specified by supplementary arguments plt f the (partial) effects f a regressin mdel (md.bj) Fr each functin, the ptins may be fund with the n-line help in R. Sme f these ptins are identical fr several graphical functins; here are the main nes (with their pssible default values): add=false axes=true type="p" xlim=, ylim= xlab=, ylab= main= sub= if TRUE superpses the plt n the previus ne (if it exists) if FALSE des nt draw the axes and the bx specifies the type f plt, "p": pints, "l": lines, "b": pints cnnected by lines, "": id. but the lines are ver the pints, "h": vertical lines, "s": steps, the data are represented by the tp f the vertical lines, "S": id. but the data are represented by the bttm f the vertical lines specifies the lwer and upper limits f the axes, fr example with xlim=c(1, 10) r xlim=range(x) anntates the axes, must be variables f mde character main title, must be a variable f mde character sub-title (written in a smaller fnt) 4.3 Lw-level pltting cmmands R has a set f graphical functins which affect an already existing graph: they are called lw-level pltting cmmands. Here are the main nes: pints(x, y) adds pints (the ptin type= can be used) lines(x, y) id. but with lines text(x, y, labels,...) adds text given by labels at crdinates (x,y); a typical use is: plt(x, y, type="n"); text(x, y, names) mtext(text, side=3, line=0, adds text given by text in the margin specified by side (see axis() belw); line specifies the line frm the pltting area...) segments(x0, y0, draws lines frm pints (x0,y0) t pints (x1,y1) x1, y1) arrws(x0, y0, x1, y1, angle= 30, cde=2) id. with arrws at pints (x0,y0) if cde=2, at pints (x1,y1) if cde=1, r bth if cde=3; angle cntrls the angle frm the shaft f the arrw t the edge f the arrw head abline(a,b) draws a line f slpe b and intercept a abline(h=y) draws a hrizntal line at rdinate y abline(v=x) draws a vertical line at abcissa x abline(lm.bj) draws the regressin line given by lm.bj (see sectin 5) 41
46 rect(x1, y1, x2, y2) plygn(x, y) legend(x, y, legend) title() axis(side, vect) bx() rug(x) lcatr(n, type="n",...) draws a rectangle which left, right, bttm, and tp limits are x1, x2, y1, and y2, respectively draws a plygn linking the pints with crdinates given by x and y adds the legend at the pint (x,y) with the symbls given by legend adds a title and ptinally a sub-title adds an axis at the bttm (side=1), n the left (2), at the tp (3), r n the right (4); vect (ptinal) gives the abcissa (r rdinates) where tick-marks are drawn adds a bx arund the current plt draws the data x n the x-axis as small vertical lines returns the crdinates (x, y) after the user has clicked n times n the plt with the muse; als draws symbls (type="p") r lines (type="l") with respect t ptinal graphic parameters (...); by default nthing is drawn (type="n") Nte the pssibility t add mathematical expressins n a plt with text(x, y, expressin(...)), where the functin expressin transfrms its argument in a mathematical equatin. Fr example, > text(x, y, expressin(p == ver(1, 1+e^-(beta*x+alpha)))) will display, n the plt, the fllwing equatin at the pint f crdinates (x, y): 1 p = 1 + e (βx+α) T include in an expressin a variable we can use the functins substitute and as.expressin; fr example t include a value f R 2 (previusly cmputed and stred in an bject named Rsquared): > text(x, y, as.expressin(substitute(r^2==r, list(r=rsquared)))) will display n the plt at the pint f crdinates (x, y): R 2 = T display nly three decimals, we can mdify the cde as fllws: > text(x, y, as.expressin(substitute(r^2==r, + list(r=rund(rsquared, 3))))) will display: Finally, t write the R in italics: R 2 = > text(x, y, as.expressin(substitute(italic(r)^2==r, + list(r=rund(rsquared, 3))))) R 2 =
47 4.4 Graphical parameters In additin t lw-level pltting cmmands, the presentatin f graphics can be imprved with graphical parameters. They can be used either as ptins f graphic functins (but it des nt wrk fr all), r with the functin par t change permanently the graphical parameters, i.e. the subsequent plts will be drawn with respect t the parameters specified by the user. Fr instance, the fllwing cmmand: > par(bg="yellw") will result in all subsequent plts drawn with a yellw backgrund. There are 73 graphical parameters, sme f them have very similar functins. The exhaustive list f these parameters can be read with?par; I will limit the fllwing table t the mst usual nes. adj cntrls text justificatin with respect t the left brder f the text s that 0 is left-justified, 0.5 is centred, 1 is right-justified, values > 1 mve the text further t the left, and negative values further t the right; if tw values are given (e.g., c(0, 0)) the secnd ne cntrls vertical justificatin with respect t the text baseline bg specifies the clur f the backgrund (e.g., bg="red", bg="blue"; the list f the 657 available clurs is displayed with clrs()) bty cntrls the type f bx drawn arund the plt, allwed values are: "", "l", "7", "c", "u" u "]" (the bx lks like the crrespnding character); if bty="n" the bx is nt drawn cex a value cntrlling the size f texts and symbls with respect t the default; the fllwing parameters have the same cntrl fr numbers n the axes, cex.axis, the axis labels, cex.lab, the title, cex.main, and the sub-title, cex.sub cl cntrls the clur f symbls; as fr cex there are: cl.axis, cl.lab, cl.main, cl.sub fnt an integer which cntrls the style f text (1: nrmal, 2: italics, 3: bld, 4: bld italics); as fr cex there are: fnt.axis, fnt.lab, fnt.main, fnt.sub las an integer which cntrls the rientatin f the axis labels (0: parallel t the axes, 1: hrizntal, 2: perpendicular t the axes, 3: vertical) lty cntrls the type f lines, can be an integer (1: slid, 2: dashed, 3: dtted, 4: dtdash, 5: lngdash, 6: twdash), r a string f up t eight characters (between "0" and "9") which specifies alternatively the length, in pints r pixels, f the drawn elements and the blanks, fr example lty="44" will have the same effet than lty=2 lwd a numeric which cntrls the width f lines mar a vectr f 4 numeric values which cntrl the space between the axes and the brder f the graph f the frm c(bttm, left, tp, right), the default values are c(5.1, 4.1, 4.1, 2.1) mfcl a vectr f the frm c(nr,nc) which partitins the graphic windw as a matrix f nr lines and nc clumns, the plts are then drawn in clumns (see sectin 4.1.2) mfrw id. but the plts are then drawn in line (see sectin 4.1.2) pch cntrls the type f symbl, either an integer between 1 and 25, r any single character within "" (Fig. 2) ps an integer which cntrls the size in pints f texts and symbls 43
48 "*" "?" "." "X" "a" *? X a Figure 2: The pltting symbls in R (pch=1:25). The clurs were btained with the ptins cl="blue", bg="yellw", the secnd ptin has an effect nly fr the symbls Any character can be used (pch="*", "?", ".",... ). pty tck tcl xaxt yaxt a character which specifies the type f the pltting regin, "s": square, "m": maximal a value which specifies the length f tick-marks n the axes as a fractin f the smallest f the width r height f the plt; if tck=1 a grid is drawn id. but as a fractin f the height f a line f text (by default tcl=-0.5) if xaxt="n" the x-axis is set but nt drawn (useful in cnjunctin with axis(side=1,...)) if yaxt="n" the y-axis is set but nt drawn (useful in cnjunctin with axis(side=2,...)) 4.5 A practical example In rder t illustrate R s graphical functinalities, let us cnsider a simple example f a bivariate graph f 10 pairs f randm variates. These values were generated with: > x <- rnrm(10) > y <- rnrm(10) The wanted graph will be btained with plt(); ne will type the cmmand: > plt(x, y) and the graph will be pltted n the active graphical device. The result is shwn n Fig. 3. By default, R makes graphs in an intelligent way: 44
49 y x Figure 3: The functin plt used withut ptins. the spaces between tick-marks n the axes, the placement f labels, etc, are calculated s that the resulting graph is as intelligible as pssible. The user may, nevertheless, change the way a graph is presented, fr instance, t cnfrm t a pre-defined editrial style, r t give it a persnal tuch fr a talk. The simplest way t change the presentatin f a graph is t add ptins which will mdify the default arguments. In ur example, we can mdify significantly the figure in the fllwing way: plt(x, y, xlab="ten randm values", ylab="ten ther values", xlim=c(-2, 2), ylim=c(-2, 2), pch=22, cl="red", bg="yellw", bty="l", tcl=0.4, main="hw t custmize a plt with R", las=1, cex=1.5) The result is n Fig. 4. Let us detail each f the used ptins. First, xlab and ylab change the axis labels which, by default, were the names f the variables. Then, xlim and ylim allw us t define the limits n bth axes 13. The graphical parameter pch is used here as an ptin: pch=22 specifies a square which cntur and backgrund clurs may be different and are given by, respectively, cl and bg. The table f graphical parameters gives the meaning f the mdificatins dne by bty, tcl, las and cex. Finally, a title is added with the ptin main. The graphical parameters and the lw-level pltting functins allw us t g further in the presentatin f a graph. As we have seen previusly, sme graphical parameters cannt be passed as arguments t a functin like plt. 13 By default, R adds 4% n each side f the axis limit. This behaviur may be altered by setting the graphical parameters xaxs="i" and yaxs="i" (they can be passed as ptins t plt()). 45
50 Hw t custmize a plt with R 2 1 Ten ther values Ten randm values Figure 4: The functin plt used with ptins. We will nw mdify sme f these parameters with par(), it is thus necessary t type several cmmands. When the graphical parameters are changed, it is useful t save their initial values befrehand t be able t restre them afterwards. Here are the cmmands used t btain Fig. 5. par <- par() par(bg="lightyellw", cl.axis="blue", mar=c(4, 4, 2.5, 0.25)) plt(x, y, xlab="ten randm values", ylab="ten ther values", xlim=c(-2, 2), ylim=c(-2, 2), pch=22, cl="red", bg="yellw", bty="l", tcl=-.25, las=1, cex=1.5) title("hw t custmize a plt with R (bis)", fnt.main=3, adj=1) par(par) Let us detail the actins resulting frm these cmmands. First, the default graphical parameters are cpied in a list called here par. Three parameters will be then mdified: bg fr the clur f the backgrund, cl.axis fr the clur f the numbers n the axes, and mar fr the sizes f the margins arund the pltting regin. The graph is drawn in a nearly similar way t Fig. 4. The mdificatins f the margins allwed t use the space arund the pltting area. The title here is added with the lw-level pltting functin title which allws t give sme parameters as arguments withut altering the rest f the graph. Finally, the initial graphical parameters are restred with the last cmmand. Nw, ttal cntrl! On Fig. 5, R still determines a few things such as the number f tick marks n the axes, r the space between the title and the pltting area. We will see nw hw t ttally cntrl the presentatin f the graph. The apprach used here is t plt a blank graph with plt(..., type="n"), then t add pints, axes, labels, etc, with lw-level pltting func- 46
51 Hw t custmize a plt with R (bis) 2 1 Ten ther values Ten randm values Figure 5: The functins par, plt and title. tins. We will fancy a few arrangements such as changing the clur f the pltting area. The cmmands fllw, and the resulting graph is n Fig. 6. par <- par() par(bg="lightgray", mar=c(2.5, 1.5, 2.5, 0.25)) plt(x, y, type="n", xlab="", ylab="", xlim=c(-2, 2), ylim=c(-2, 2), xaxt="n", yaxt="n") rect(-3, -3, 3, 3, cl="crnsilk") pints(x, y, pch=10, cl="red", cex=2) axis(side=1, c(-2, 0, 2), tcl=-0.2, labels=false) axis(side=2, -1:1, tcl=-0.2, labels=false) title("hw t custmize a plt with R (ter)", fnt.main=4, adj=1, cex.main=1) mtext("ten randm values", side=1, line=1, at=1, cex=0.9, fnt=3) mtext("ten ther values", line=0.5, at=-1.8, cex=0.9, fnt=3) mtext(c(-2, 0, 2), side=1, las=1, at=c(-2, 0, 2), line=0.3, cl="blue", cex=0.9) mtext(-1:1, side=2, las=1, at=-1:1, line=0.2, cl="blue", cex=0.9) par(par) Like befre, the default graphical parameters are saved, and the clur f the backgrund and the margins are mdified. The graph is then drawn with type="n" t nt plt the pints, xlab="", ylab="" t nt write the axis labels, and xaxt="n", yaxt="n" t nt draw the axes. This results in drawing nly the bx arund the pltting area, and defining the axes with respect t xlim et ylim. Nte that we culd have used the ptin axes=false but in this case neither the axes, nr the bx wuld have been drawn. 47
52 Ten ther values Hw t custmize a plt with R (ter) Ten randm values Figure 6: A hand-made graph. The elements are then added in the pltting regin s defined with sme lw-level pltting functins. Befre adding the pints, the clur inside the pltting area is changed with rect(): the size f the rectangle are chsen s that it is substantially larger than the pltting area. The pints are pltted with pints(); a new symbl was used. The axes are added with axis(): the vectr given as secnd argument specifies the crdinates f the tick-marks. The ptin labels=false specifies that n anntatin must be written with the tick-marks. This ptin als accepts a vectr f mde character, fr example labels=c("a", "B", "C"). The title is added with title(), but the fnt is slightly changed. The anntatins n the axes are written with mtext() (marginal text). The first argument f this functin is a vectr f mde character giving the text t be written. The ptin line indicates the distance frm the pltting area (by default line=0), and at the crdinnate. The secnd call t mtext() uses the default value f side (3). The tw ther calls t mtext() pass a numeric vectr as first argument: this will be cnverted int character. 4.6 The grid and lattice packages The packages grid and lattice implement the grid and lattice systems. Grid is a new graphical mde with its wn system f graphical parameters which are distinct frm thse seen abve. The tw main distinctins f grid cmpared t the base graphics are: a mre flexible way t split graphical devices using viewprts which culd be verpalling (graphical bjects may even be shared amng distinct viewprts, e.g., arrws); 48
53 graphical bjects (grb) may be mdified r remved frm a graph withut requiring the re-draw all the graph (as must be dne with base graphics). Grid graphics cannt usually be cmbined r mixed with base graphics (the gridbase package must be used t d this). Hwever, it is pssible t use bth graphical mdes in the same sessin n the same graphical device. Lattice is essentially the implementatin in R f the Trellis graphics f S-PLUS. Trellis is an apprach fr visualizing multivariate data which is particularly apprpriate fr the explratin f relatins r interactins amng variables 14. The main idea behind lattice (and Trellis as well) is that f cnditinal multiple graphs: a bivariate graph will be split in several graphs with respect t the values f a third variable. The functin cplt uses a similar apprach, but lattice ffers much wider functinalities. Lattice uses the grid graphical mde. Mst functins in lattice take a frmula as their main argument 15, fr example y ~ x. The frmula y ~ x z means that the graph f y with respect t x will be pltted as several graphs with respect t the values f z. The fllwing table gives the main functins in lattice. The frmula given as argument is the typical necessary frmula, but all these functins accept a cnditinal frmula (y ~ x z) as main argument; in the latter case, a multiple graph, with respect t the values f z, is pltted as will be seen in the examples belw. barchart(y ~ x) histgram f the values f y with respect t thse f x bwplt(y ~ x) bx-and-whiskers plt densityplt(~ x) density functins plt dtplt(y ~ x) Cleveland dt plt (stacked plts line-by-line and clumn-by-clumn) histgram(~ x) histgram f the frequencies f x qqmath(~ x) quantiles f x with respect t the values expected under a theretical distributin stripplt(y ~ x) single dimensin plt, x must be numeric, y may be a factr qq(y ~ x) quantiles t cmpare tw distributins, x must be numeric, y may be numeric, character, r factr but must have tw levels xyplt(y ~ x) bivariate plts (with many functinalities) levelplt(z ~ x*y) cnturplt(z ~ x*y) clured plt f the values f z at the crdinates given by x and y (x, y and z are all f the same length) clud(z ~ x*y) 3-D perspective plt (pints) wireframe(z ~ x*y) id. (surface) splm(~ x) matrix f bivariate plts parallel(~ x) parallel crdinates plt Let us see nw sme examples in rder t illustrate a few aspects f lattice. The package must be laded in memry with the cmmand library(lattice) plt() als accepts a frmula as its main argument: if x and y are tw vectrs f the same length, plt(y ~ x) and plt(x, y) will give identical graphs. 49
54 n = 35 n = 40 n = n = 20 n = 25 n = Density n = 5 n = 10 n = x Figure 7: The functin densityplt. s that the functins can be accessed. Let us start with the graphs f density functins. Such graphs can be dne simply with densityplt(~ x) which will plt a curve f the empirical density functin with the pints crrespnding t the bservatins n the x- axis (similarly t rug()). Our example will be slightly mre cmplicated with the superpsitin, n each plt, f the curves f empirical density and thse predicted frm a nrmal law. It is necessary t use the argument panel which defines what is drawn n each plt. The cmmands are: n <- seq(5, 45, 5) x <- rnrm(sum(n)) y <- factr(rep(n, n), labels=paste("n =", n)) densityplt(~ x y, panel = functin(x,...) { panel.densityplt(x, cl="darkolivegreen",...) panel.mathdensity(dmath=dnrm, args=list(mean=mean(x), sd=sd(x)), cl="darkblue") }) The first three lines f cmmand generate a randm sample f independent nrmal variates which is split in sub-samples f size equal t 5, 10, 15,..., and 45. Then cmes the call t densityplt prducing a plt fr each sub-sample. panel takes as argument a functin. In ur example, we have defined a functin which calls tw functins pre-defined in lattice: panel.densityplt t draw the empirical density functin, and panel.mathdensity t draw the density functin predicted frm a nrmal law. The functin panel.densityplt is called by default if n argument is given t panel: the cmmand densityplt 50
55 lat lng Figure 8: The functin xyplt with the data quakes. (~ x y) wuld have resulted in the same graph than Fig. 7 but withut the blue curves. The next examples are taken, mre r less mdified, frm the help pages f lattice, and use sme data sets available in R: the lcatins f 1000 seismic events near the Fiji Islands, and sme flwer measurements made n three species f iris. Fig. 8 shws the gegraphic lcatins f the seismic events with respect t depth. The cmmands necessary fr this graph are: data(quakes) mini <- min(quakes$depth) maxi <- max(quakes$depth) int <- ceiling((maxi - mini)/9) inf <- seq(mini, maxi, int) quakes$depth.cat <- factr(flr(((quakes$depth - mini) / int)), labels=paste(inf, inf + int, sep="-")) xyplt(lat ~ lng depth.cat, data = quakes) The first cmmand lads the data quakes in memry. The five next cmmands create a factr by dividing the depth (variable depth) in nine equallyranged intervals: the levels f this factr are labelled with the lwer and upper bunds f these intervals. It then suffices t call the functin xyplt with the apprpriate frmula and an argument data indicating where xyplt must lk fr the variables 16. With the data iris, the verlap amng the different species is sufficiently small s they can be pltted n the figure (Fig. 9). The cmmands are: 16 plt() cannt take an argument data, the lcatin f the variables must be given explicitly, fr example plt(quakes$lng ~ quakes$lat). 51
56 Petal.Length setsa versiclr virginica Petal.Width Figure 9: The functin xyplt with the data iris. data(iris) xyplt( Petal.Length ~ Petal.Width, data = iris, grups=species, panel = panel.superpse, type = c("p", "smth"), span=.75, aut.key = list(x = 0.15, y = 0.85) ) The call t the functin xyplt is here a bit mre cmplex than in the previus example and uses several ptins that we will detail. The ptin grups, as suggested by its name, defines grups that will be used by the ther ptins. We have already seen the ptin panel which defines hw the different grups will be represented n the graph: we use here a pre-defined functin panel.superpse in rder t superpse the grups n the same plt. N ptin is passed t panel.superpse, the default clurs will be used t distinguish the grups. The ptin type, like in plt(), specifies hw the data are represented, but here we can give several arguments as a vectr: "p" t draw pints and "smth" t draw a smth curve which degree f smthness is specified by span. The ptin aut.key adds a legend t the graph: it is nly necessary t give, as a list, the crdinates where the legend is t be pltted. Nte that here these crdinates are relative t the size f the plt (i.e. in [0, 1]). We will see nw the functin splm with the same data n iris. The fllwing cmmands were used t prduce Fig. 10: splm( ~iris[1:4], grups = Species, data = iris, xlab = "", panel = panel.superpse, 52
57 Setsa Versiclr Virginica Petal.Width Petal.Length Sepal.Width Sepal.Length Figure 10: The functin splm with the data iris (1). ) aut.key = list(clumns = 3) The main argument is this time a matrix (the fur first clumns f iris). The result is the set f pssible bivariate plts amng the clumns f the matrix, like the standard functin pairs. By default, splm adds the text Scatter Plt Matrix under the x-axis: t avid this, the ptin xlab="" was used. The ther ptins are similar t the previus example, except that clumns = 3 fr aut.key was specified s the legend is displayed in three clumns. Fig. 10 culd have been dne with pairs(), but this latter functin cannt make cnditinal graphs like n Fig. 11. The cde used is relatively simple: splm(~iris[1:3] Species, data = iris, pscales = 0, varnames = c("sepal\nlength", "Sepal\nWidth", "Petal\nLength")) The sub-graphs being relatively small, we added tw ptins t imprve the legibility f the figure: pscales = 0 remves the tick-marks n the axes (all sub-graphs are drawn n the same scales), and the names f the variables were re-defined t display them n tw lines ("\n" cdes fr a line break in a character string). The last example uses the methd f parallel crdinates fr the explratry analysis f multivariate data. The variables are arranged n an axis (e.g., the y-axis), and the bserved values are pltted n the ther axis (the variables are scaled similarly, e.g., by standardizing them). The different values f the same individual are jined by a line. With the data iris, Fig. 12 is btained with the fllwing cde: parallel(~iris[, 1:4] Species, data = iris, layut = c(3, 1)) 53
58 virginica Petal Length Sepal Width Sepal Length setsa versiclr Petal Length Petal Length Sepal Width Sepal Width Sepal Length Sepal Length Scatter Plt Matrix Figure 11: The functin splm with the data iris (2). Min Max setsa versiclr virginica Petal.Width Petal.Length Sepal.Width Sepal.Length Min Max Min Max Figure 12: The functin parallel with the data iris. 54
59 5 Statistical analyses with R Even mre than fr graphics, it is impssible here t g in the details f the pssibilities ffered by R with respect t statistical analyses. My gal here is t give sme landmarks with the aim t have an idea f the features f R t perfrm data analyses. The package stats cntains functins fr a wide range f basic statistical analyses: classical tests, linear mdels (including least-squares regressin, generalized linear mdels, and analysis f variance), distributins, summary statistics, hierarchical clustering, time-series analysis, nnlinear least squares, and multivariate analysis. Other statistical methds are available in a large number f packages. Sme f them are distributed with a base installatin f R and are labelled recmmanded, and many ther packages are cntributed and must be installed by the user. We will start with a simple example which requires n ther package than stats in rder t intrduce the general apprach t data analysis in R. Then, we will detail sme ntins, frmulae and generic functins, which are useful whatever the type f analysis perfrmed. We will cnclude with an verview n packages. 5.1 A simple example f analysis f variance The functin fr the analysis f variance in stats is av. In rder t try it, let us take a data set distributed with R: InsectSprays. Six insecticides were tested in field cnditins, the bserved respnse was the number f insects. Each insecticide was tested 12 times, thus there are 72 bservatins. We will nt cnsider here the graphical explratin f the data, but will fcus n a simple analysis f variance f the respnse with respect t the insecticide. After lading the data in memry with the functin data, the analysis is perfrmed after a square-rt transfrmatin f the respnse: > data(insectsprays) > av.spray <- av(sqrt(cunt) ~ spray, data = InsectSprays) The main (and mandatry) argument f av is a frmula which specifies the respnse n the left-hand side f the tilde symbl ~ and the predictr n the right-hand side. The ptin data = InsectSprays specifies that the variables must be fund in the data frame InsectSprays. This syntax is equivalent t: > av.spray <- av(sqrt(insectsprays$cunt) ~ InsectSprays$spray) r still (if we knw the clumn numbers f the variables): 55
60 > av.spray <- av(sqrt(insectsprays[, 1]) ~ InsectSprays[, 2]) The first syntax is t be preferred since it is clearer. The results are nt displayed since they are assigned t an bject called av.spray. We will then used sme functins t extract the results, fr example print t display a brief summary f the analysis (mstly the estimated parameters) and summary t display mre details (included the statistical tests): > av.spray Call: av(frmula = sqrt(cunt) ~ spray, data = InsectSprays) Terms: spray Residuals Sum f Squares Deg. f Freedm 5 66 Residual standard errr: Estimated effects may be unbalanced > summary(av.spray) Df Sum Sq Mean Sq F value Pr(>F) spray < 2.2e-16 *** Residuals Signif. cdes: 0 *** ** 0.01 * We may remind that typing the name f the bject as a cmmand is similar t the cmmand print(av.spray). A graphical representatin f the results can be dne with plt() r termplt(). Befre typing plt(av.spray) we will divide the graphics int fur parts s that the fur diagnstics plts will be dne n the same graph. The cmmands are: > par <- par() > par(mfcl = c(2, 2)) > plt(av.spray) > par(par) > termplt(av.spray, se=true, partial.resid=true, rug=true) and the resulting graphics are n Figs. 13 and Frmulae Frmulae are a key element in statistical analyses with R: the ntatin used is the same fr (almst) all functins. A frmula is typically f the frm y ~ mdel where y is the analysed respnse and mdel is a set f terms fr which sme parameters are t be estimated. These terms are separated with arithmetic symbls but they have here a particular meaning. 56
61 Residuals Residuals vs Fitted Fitted values Standardized residuals Scale Lcatin plt Fitted values Standardized residuals Nrmal Q Q plt Ck s distance Ck s distance plt Theretical Quantiles Obs. number Figure 13: Graphical representatin f the results frm the functin av with plt(). a+b X a:b a*b ply(a, n) ^n b %in% a additive effects f a and f b if X is a matrix, this specifies an additive effect f each f its clumns, i.e. X[,1]+X[,2]+...+X[,ncl(X)]; sme f the clumns may be selected with numeric indices (e.g., X[,2:4]) interactive effect between a and b additive and interactive effects (identical t a+b+a:b) plynmials f a up t degree n includes all interactins up t level n, i.e. (a+b+c)^2 is identical t a+b+c+a:b+a:c+b:c the effects f b are nested in a (identical t a+a:b, r a/b) -b remves the effect f b, fr example: (a+b+c)^2-a:b is identical t a+b+c+a:c+b:c -1 y~x-1 is a regressin thrugh the rigin (id. fr y~x+0 r 0+y~x) 1 y~1 fits a mdel with n effects (nly the intercept) ffset(...) adds an effect t the mdel withut estimating any parameter (e.g., ffset(3*x)) We see that the arithmetic peratrs f R have in a frmula a different meaning than they have in expressins. Fr example, the frmula y~x1+x2 defines the mdel y = β 1 x 1 + β 2 x 2 + α, and nt (if the peratr + wuld have its usual meaning) y = β(x 1 + x 2 ) + α. T include arithmetic peratins in a frmula, we can use the functin I: the frmula y~i(x1+x2) defines the mdel y = β(x 1 + x 2 ) + α. Similarly, t define the mdel y = β 1 x + β 2 x 2 + α, we will use the frmula y ~ ply(x, 2) (and nt y ~ x + x^2). Hwever, it is 57
62 Partial fr spray spray Figure 14: Graphical representatin f the results frm the functin av with termplt(). pssible t include a functin in a frmula in rder t transfrm a variable as seen abve with the insect sprays analysis f variance. Fr analyses f variance, av() accepts a particular syntax t define randm effects. Fr instance, y ~ a + Errr(b) means the additive effects f the fixed term a and the randm ne b. 5.3 Generic functins We remember that R s functins act with respect t the attributes f the bjects pssibly passed as arguments. The class is an attribute deserving sme attentin here. It is very cmmn that the R statistical functins return an bject f class with the same name (e.g., av returns an bject f class "av", lm returns ne f class "lm"). The functins that we can use subsequently t extract the results will act specifically with respect t the class f the bject. These functins are called generic. Fr instance, the functin which is mst ften used t extract results frm analyses is summary which displays detailed results. Whether the bject given as argument is f class "lm" (linear mdel) r "av" (analysis f variance), it sunds bvius that the infrmatin t display will nt be the same. The advantage f generic functins is that the syntax is the same in all cases. An bject cntaining the results f an analysis is generally a list, and the way it is displayed is determined by its class. We have already seen this ntin that the actin f a functin depends n the kind f bject given as argument. It is a general feature f R 17. The fllwing table gives the main generic func- 17 There are mre than 100 generic functins in R. 58
63 tins which can be used t extract infrmatin frm bjects resulting frm an analysis. The typical usage f these functins is: > md <- lm(y ~ x) > df.residual(md) [1] 8 print summary df.residual cef residuals deviance fitted lglik AIC returns a brief summary returns a detailed summary returns the number f residual degrees f freedm returns the estimated cefficients (smetimes with their standard-errrs) returns the residuals returns the deviance returns the fitted values cmputes the lgarithm f the likelihd and the number f parameters cmputes the Akaike infrmatin criterin r AIC (depends n lglik()) A functin like av r lm returns a list with its different elements crrespnding t the results f the analysis. If we take ur example f an analysis f variance with the data InsectSprays, we can lk at the structure f the bject returned by av: > str(av.spray, max.level = -1) List f 13 - attr(*, "class")= chr [1:2] "av" "lm" Anther way t lk at this structure is t display the names f the bject: > names(av.spray) [1] "cefficients" "residuals" "effects" [4] "rank" "fitted.values" "assign" [7] "qr" "df.residual" "cntrasts" [10] "xlevels" "call" "terms" [13] "mdel" The elements can then be extracted as we have already seen: > av.spray$cefficients (Intercept) sprayb sprayc sprayd spraye sprayf summary() als creates a list which, in the case f av(), is simply a table f tests: 59
64 > str(summary(av.spray)) List f 1 $ :Classes anva and data.frame : 2 bs. f 5 variables:..$ Df : num [1:2] $ Sum Sq : num [1:2] $ Mean Sq: num [1:2] $ F value: num [1:2] 44.8 NA..$ Pr(>F) : num [1:2] 0 NA - attr(*, "class")= chr [1:2] "summary.av" "listf" > names(summary(av.spray)) NULL Generic functins d nt generally perfrm any actin n bjects: they call the apprpriate functin with respect t the class f the argument. A functin called by a generic is a methd in R s jargn. Schematically, a methd is cnstructed as generic.cls, where cls is the class f the bject. Fr instance, in the case f summary, we can display the crrespnding methds: > aprps("^summary") [1] "summary" "summary.av" [3] "summary.avlist" "summary.cnnectin" [5] "summary.data.frame" "summary.default" [7] "summary.factr" "summary.glm" [9] "summary.glm.null" "summary.infl" [11] "summary.lm" "summary.lm.null" [13] "summary.manva" "summary.matrix" [15] "summary.mlm" "summary.packagestatus" [17] "summary.posixct" "summary.posixlt" [19] "summary.table" We can see the difference fr this generic in the case f a linear regressin, cmpared t an analysis f variance, with a small simulated example: > x <- y <- rnrm(5) > lm.spray <- lm(y ~ x) > names(lm.spray) [1] "cefficients" "residuals" "effects" [4] "rank" "fitted.values" "assign" [7] "qr" "df.residual" "xlevels" [10] "call" "terms" "mdel" > names(summary(lm.spray)) [1] "call" "terms" "residuals" [4] "cefficients" "sigma" "df" [7] "r.squared" "adj.r.squared" "fstatistic" [10] "cv.unscaled" The fllwing table shws sme generic functins that d supplementary analyses frm an bject resulting frm an analysis, the main argument being 60
65 this latter bject, but in sme cases a further argument is necessary like fr predict r update. add1 drp1 step anva predict update tests successively all the terms that can be added t a mdel tests successively all the terms that can be remved frm a mdel selects a mdel with AIC (calls add1 and drp1) cmputes a table f analysis f variance r deviance fr ne r several mdels cmputes the predicted values fr new data frm a fitted mdel re-fits a mdel with a new frmula r new data There are als varius utilities functins that extract infrmatin frm a mdel bject r a frmula, such as alias which finds the linearly dependent terms in a linear mdel specified by a frmula. Finally, there are, f curse, graphical functins such as plt which displays varius diagnstics, r termplt (see the abve example), thugh this latter functin is nt generic but calls predict. 5.4 Packages The fllwing table lists the standard packages which are distributed with a base installatin f R. Sme f them are laded in memry when R starts; this can be displayed with the functin search: > search() [1] ".GlbalEnv" "package:methds" [3] "package:stats" "package:graphics" [5] "package:grdevices" "package:utils" [7] "package:datasets" "Autlads" [9] "package:base" The ther packages may be used after being laded: > library(grid) The list f the functins in a package can be displayed with: > library(help = grid) r by brwsing the help in html frmat. The infrmatin relative t each functin can be accessed as previusly seen (p. 7). 61
66 Package base datasets grdevices graphics grid methds splines stats stats4 tcltk tls utils Descriptin base R functins base R datasets graphics devices fr base and grid graphics base graphics grid graphics definitin f methds and classes fr R bjects and prgramming tls regressin spline functins and classes statistical functins statistical functins using S4 classes functins t interface R with Tcl/Tk graphical user interface elements tls fr package develpment and administratin R utility functins Many cntributed packages add t the list f statistical methds available in R. They are distributed separately, and must be installed and laded in R. A cmplete list f the cntributed packages, with descriptins, is n the CRAN Web site 18. Several f these packages are recmmanded since they cver statistical methds ften used in data analysis. The recmmended packages are ften distributed with a base installatin f R. They are briefly described in the fllwing table. Package Descriptin bt resampling and btstraping methds class classificatin methds cluster clustering methds freign functins fr reading data stred in varius frmats (S3, Stata, SAS, Minitab, SPSS, Epi Inf) KernSmth methds fr kernel smthing and density estimatin (including bivariate kernels) lattice Lattice (Trellis) graphics MASS cntains many functins, tls and data sets frm the libraries f Mdern Applied Statistics with S by Venables & Ripley mgcv generalized additive mdels nlme linear and nn-linear mixed-effects mdels nnet neural netwrks and multinmial lg-linear mdels rpart recursive partitining spatial spatial analyses ( kriging, spatial cvariance,... ) survival survival analyses
67 There are tw ther main repsitries f R packages: the Omegahat Prject fr Statistical Cmputing 19 which fcuses n web-based applicatins and interfaces between sftwares and languages, and the Bicnductr Prject 20 specialized in biinfrmatic applicatins (particularly fr the analysis f micrarray data). The prcedure t install a package depends n the perating system and whether R was installed frm the surces r pre-cmpiled binaries. In the latter situatin, it is recmmended t use the pre-cmpiled packages available n CRAN s site. Under Windws, the binary Rgui.exe has a menu Packages allwing t install packages via internet frm the CRAN Web site, r frm zipped files n the lcal disk. If R was cmpiled, a package can be installed frm its surces which are distributed as a.tar.gz file. Fr instance, if we want t install the package gee, we will first dwnlad the file gee tar.gz (the number indicates the versin f the package; generally nly ne versin is available n CRAN). We will then type frm the system (and nt in R) the cmmand: R CMD INSTALL gee_ tar.gz There are several useful functins t manage packages such as installed. packages, CRAN.packages, r dwnlad.packages. It is als useful t type regularly the cmmand: > update.packages() which checks the versins f the packages installed against thse available n CRAN (this cmmand can be called frm the menu Packages under Windws). The user can then update the packages with mre recent versins than thse installed n the cmputer
68 6 Prgramming with R in pratice Nw that we have dne an verview f R s functinalities, let us return t the language and prgramming. We will see a few simple ideas likely t be used in practice. 6.1 Lps and vectrizatin An advantage f R cmpared t sftwares with pull-dwn menus is the pssibility t prgram simply a series f analyses which will be executed successively. This is cmmn t any cmputer language, but R has sme particular features which make prgramming easier fr nn-specialists. Like ther languages, R has sme cntrl structures which are nt dissimilar t thse f the C language. Suppse we have a vectr x, and fr each element f x with the value b, we want t give the value 0 t anther variable y, therwise 1. We first create a vectr y f the same length than x: y <- numeric(length(x)) fr (i in 1:length(x)) if (x[i] == b) y[i] <- 0 else y[i] <- 1 Several instructins can be executed if they are placed within braces: fr (i in 1:length(x)) { y[i] < } if (x[i] == b) { y[i] < } Anther pssible situatin is t execute an instructin as lng as a cnditin is true: while (myfun > minimum) {... } Hwever, lps and cntrl structures can be avided in mst situatins thanks t a feature f R: vectrizatin. Vectrizatin makes lps implicit in expressin, and we have seen many cases. Let us cnsider the additin f tw vectrs: 64
69 > z <- x + y This additin culd be written with a lp, as this is dne in mst languages: > z <- numeric(length(x)) > fr (i in 1:length(z)) z[i] <- x[i] + y[i] In this case, it is necessary t create the vectr z befrehand because f the use f the indexing system. We realize that this explicit lp will wrk nly if x and y are f the same length: it must be changed if this is nt true, whereas the first expressin will wrk in all situatins. The cnditinal executins (if... else) can be avided with the use f the lgical indexing; cming back t the abve example: > y[x == b] <- 0 > y[x!= b] <- 1 In additin t being simpler, vectrized expressins are cmputatinally mre efficient, particularly with large quantities f data. There are als several functins f the type apply which avids writing lps. apply acts n the rws and/r clumns f a matrix, its syntax is apply(x, MARGIN, FUN,...), where X is a matrix, MARGIN indicates whether t cnsider the rws (1), the clumns (2), r bth (c(1, 2)), FUN is a functin (r an peratr, but in this case it must be specified within brackets) t apply, and... are pssible ptinal arguments fr FUN. A simple example fllws. > x <- rnrm(10, -5, 0.1) > y <- rnrm(10, 5, 2) > X <- cbind(x, y) # the clumns f X keep the names "x" and "y" > apply(x, 2, mean) x y > apply(x, 2, sd) x y lapply() acts n a list: its syntax is similar t apply and it returns a list. > frms <- list(y ~ x, y ~ ply(x, 2)) > lapply(frms, lm) [[1]] Call: FUN(frmula = X[[1]]) Cefficients: 65
70 (Intercept) x [[2]] Call: FUN(frmula = X[[2]]) Cefficients: (Intercept) ply(x, 2)1 ply(x, 2) sapply() is a flexible variant f lapply() which can take a vectr r a matrix as main argument, and returns its results in a mre user-friendly frm, generally as a table. 6.2 Writing a prgram in R Typically, an R prgram is written in a file saved in ASCII frmat and named with the extensin.r. A typical situatin where a prgram is useful is when ne wants t d the same tasks several times. In ur first example, we want t d the same plt fr three different species f birds, the data being in three distinct files. We will prceed step by step, and see different ways t prgram this very simple prblem. First, let us make ur prgram in the mst intuitive way by executing successively the needed cmmands, taking care t partitin the graphical device befrehand. layut(matrix(1:3, 3, 1)) data <- read.table("swal.dat") plt(data$v1, data$v2, type="l") title("swallw") data <- read.table("wren.dat") plt(data$v1, data$v2, type="l") title("wren") data <- read.table("dunn.dat") plt(data$v1, data$v2, type="l") title("dunnck") # partitin the graphics # read the data # add a title The character # is used t add cmments in a prgram: R then ges t the next line. The prblem f this first prgram is that it may becme quite lng if we want t add ther species. Mrever, sme cmmands are executed several times, thus they can be gruped tgether and executed after changing sme arguments. The strategy used here is t put these arguments in vectrs f mde character, and then use the indexing t access these different values. 66
71 layut(matrix(1:3, 3, 1)) # partitin the graphics species <- c("swallw", "wren", "dunnck") file <- c("swal.dat", "Wren.dat", "Dunn.dat") fr(i in 1:length(species)) { data <- read.table(file[i]) # read the data plt(data$v1, data$v2, type="l") title(species[i]) # add a title } Nte that there are n duble qutes arund file[i] in read.table() since this argument is f mde character. Our prgram is nw mre cmpact. It is easier t add ther species since the vectrs cntaining the species and file names are at the beginning f the prgram. The abve prgrams will wrk crrectly if the data files.dat are lcated in the wrking directry f R, therwise the user must either change the wrking directry, r specifiy the path in the prgram (fr example: file <- "/hme/paradis/data/swal.dat"). If the prgram is written in the file Mybirds.R, it will be called by typing: > surce("mybirds.r") Like fr any input frm a file, it is necessary t give the path t access the file if it is nt in the wrking directry. 6.3 Writing yur wn functins We have seen that mst f R s wrk is dne with functins which arguments are given within parentheses. Users can write their wn functins, and these will have exactly the same prperties than ther functins in R. Writing yur wn functins allws an efficient, flexible, and ratinal use f R. Let us cme back t ur example f reading sme data fllwed by pltting a graph. If we want t d this peratin in different situatins, it may be a gd idea t write a functin: myfun <- functin(s, F) { data <- read.table(f) plt(data$v1, data$v2, type="l") title(s) } T be executed, this functin must be laded in memry, and this can be dne in several ways. The lines f the functin can be typed directly n the keybard, like any ther cmmand, r cpied and pasted frm an editr. If the functin has been saved in a text file, it can be laded with surce() 67
72 like anther prgram. If the user wants sme functins t be laded each time when R starts, they can be saved in a wrkspace.rdata which will be laded in memry if it is in the wrking directry. Anther pssibility is t cnfigure the file.rprfile r Rprfile (see?startup fr details). Finally, it is pssible t create a package, but this will nt be discussed here (see the manual Writing R Extensins ). Once the functin is laded, we will be able with a single cmmand t read the data and plt the graph, fr instance with myfun("swallw", "Swal.dat"). Thus, we have nw a third versin f ur prgram: layut(matrix(1:3, 3, 1)) myfun("swallw", "Swal.dat") myfun("wren", "Wrenn.dat") myfun("dunnck", "Dunn.dat") We may als use sapply() leading t a furth versin f ur prgram: layut(matrix(1:3, 3, 1)) species <- c("swallw", "wren", "dunnck") file <- c("swal.dat", "Wren.dat", "Dunn.dat") sapply(species, myfun, file) In R, it is nt necessary t declare the variables used within a functin. When a functin is executed, R uses a rule called lexical scping t decide whether an bject is lcal t the functin, r glbal. T understand this mechanism, let us cnsider the very simple functin belw: > f <- functin() print(x) > x <- 1 > f() [1] 1 The name x is nt used t create an bject within f(), s R will seek in the enclsing envirnment if there is an bject called x, and will print its value (therwise, a message errr is displayed, and the executin is halted). If x is used as the name f an bject within ur functin, the value f x in the glbal envirnment is nt used. > x <- 1 > f2 <- functin() { x <- 2; print(x) } > f2() [1] 2 > x [1] 1 This time print() uses the bject x that is defined within its envirnment, that is the envirnment f f2. 68
73 The wrd enclsing abve is imprtant. In ur tw example functins, there are tw envirnments: the glbal ne, and the ne f the functin f r f2. If there are three r mre nested envirnments, the search fr the bjects is made prgressively frm a given envirnment t the enclsing ne, and s n, up t the glbal ne. There are tw ways t specify arguments t a functin: by their psitins r by their names (als called tagged arguments). Fr example, let us cnsider a functin with three arguments: f <- functin(arg1, arg2, arg3) {...} f() can be executed withut using the names arg1,..., if the crrespnding bjects are placed in the crrect psitin, fr instance: f(x, y, z). Hwever, the psitin has n imprtance if the names f the arguments are used, e.g. f(arg3 = z, arg2 = y, arg1 = x). Anther feature f R s functins is the pssibility t use default values in their definitin. Fr instance: f <- functin(arg1, arg2 = 5, arg3 = FALSE) {...} The cmmands f(x), f(x, 5, FALSE), and f(x, arg3 = FALSE) will have exactly the same result. The use f default values in a functin definitin is very useful, particularly when used with tagged arguments (i.e. t change nly ne default value such as f(x, arg3 = TRUE). T cnclude this sectin, let us see anther example which is nt purely statistical, but it illustrates the flexibility f R. Cnsider we wish t study the behaviur f a nn-linear mdel: Ricker s mdel defined by: N t+1 = N t exp [ r ( 1 N t K This mdel is widely used in ppulatin dynamics, particularly f fish. We want, using a functin, t simulate this mdel with respect t the grwth rate r and the initial number in the ppulatin N 0 (the carrying capacity K is ften taken equal t 1 and this value will be taken as default); the results will be displayed as a plt f numbers with respect t time. We will add an ptin t allw the user t display nly the numbers in the last few time steps (by default all results will be pltted). The functin belw can d this numerical analysis f Ricker s mdel. ricker <- functin(nzer, r, K=1, time=100, frm=0, t=time) { N <- numeric(time+1) N[1] <- nzer fr (i in 1:time) N[i+1] <- N[i]*exp(r*(1 - N[i]/K)) Time <- 0:time plt(time, N, type="l", xlim=c(frm, t)) } )] 69
74 Try it yurself with: > layut(matrix(1:3, 3, 1)) > ricker(0.1, 1); title("r = 1") > ricker(0.1, 2); title("r = 2") > ricker(0.1, 3); title("r = 3") 70
75 7 Literature n R Manuals. Several manuals are distributed with R in R HOME/dc/manual/: An Intrductin t R [R-intr.pdf], R Installatin and Administratin [R-admin.pdf], R Data Imprt/Exprt [R-data.pdf], Writing R Extensins [R-exts.pdf], R Language Definitin [R-lang.pdf]. The files may be in different frmats (pdf, html, texi,... ) depending n the type f installatin. FAQ. R is als distributed with an FAQ (Frequently Asked Questins) lcalized in the directry R HOME/dc/html/. A versin f the R-FAQ is regularly updated n CRAN s Web site: On-line resurces. The CRAN Web site hsts several dcuments, bibligraphic resurces, and links t ther sites. There are als a list f publicatins (bks and articles) abut R r statistical methds 21 and sme dcuments and tutrials written by R s users 22. Mailing lists. There are fur discussin lists n R; t subscribe, send a message, r read the archives see: The general discussin list r-help is an interesting surce f infrmatin fr the users f R (the three ther lists are dedicated t annucements f new versins, and fr develpers). Many users have sent t r-help functins r prgrams which can be fund in the archives. If a prblem is encuntered with R, it is thus imprtant t prceed in the fllwing rder befre sending a message t r-help : 1. read carefully the n-line help (pssibly using the search engine); 2. read the R-FAQ; 3. search the archives f r-help at the abve address, r by using ne f the search engines develped n sme Web sites 23 ; 4. read the psting guide 24 befre sending yur questin(s) The addresses f these sites are listed at
76 R News. The electrnic jurnal R News aims t fill the gap between the electrnic discussin lists and traditinal scientific publicatins. The first issue was published n January Citing R in a publicatin. Finally, if yu mentin R in a publicatin, yu must cite the fllwing reference: R Develpment Cre Team (2005). R: A language and envirnment fr statistical cmputing. R Fundatin fr Statistical Cmputing, Vienna, Austria. ISBN , URL:
An Introduction to Statistical Learning
Springer Texts in Statistics Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani An Intrductin t Statistical Learning with Applicatins in R Springer Texts in Statistics 103 Series Editrs: G. Casella
Building Your Book for Kindle
Building Yur Bk fr Kindle We are excited yu ve decided t design, frmat, and prepare yur bk fr Kindle! We ll walk yu thrugh the necessary steps in creating a prfessinal digital file f yur bk fr quick uplad
How to use Moodle 2.7. Teacher s Manual for the world s most popular LMS. Jaswinder Singh
Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh Hw t Use Mdle 2.7 2 Hw t use Mdle 2.7, 1 st Editin Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh 3 This bk is dedicated t my
How To Use Sas 9.4.1.1
What's New in SAS 9.4 SAS Dcumentatin The crrect bibligraphic citatin fr this manual is as fllws: SAS Institute Inc. 2013. What's New in SAS 9.4. Cary, NC: SAS Institute Inc. What's New in SAS 9.4 Cpyright
Across a wide variety of fields, data are
Frm Data Mining t Knwledge Discvery in Databases Usama Fayyad, Gregry Piatetsky-Shapir, and Padhraic Smyth Data mining and knwledge discvery in databases have been attracting a significant amunt f research,
The Synchronization of Periodic Routing Messages
The Synchrnizatin f Peridic Ruting Messages Sally Flyd and Van Jacbsn, Lawrence Berkeley Labratry, One Cycltrn Rad, Berkeley CA 9470, flyd@eelblgv, van@eelblgv T appear in the April 994 IEEE/ACM Transactins
How to Write Program Objectives/Outcomes
Hw t Write Prgram Objectives/Outcmes Objectives Gals and Objectives are similar in that they describe the intended purpses and expected results f teaching activities and establish the fundatin fr assessment.
SECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0
SECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0 INTRODUCTION The guidance prvided herein is the third versin f the Clud Security Alliance dcument, Security Guidance fr Critical Areas
An Introduction to R. W. N. Venables, D. M. Smith and the R Core Team
An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Version 3.2.0 (2015-04-16) W. N. Venables, D. M. Smith and the R Core Team This manual is for R, version 3.2.0
A Beginner s Guide to Successfully Securing Grant Funding
A Beginner s Guide t Successfully Securing Grant Funding Intrductin There is a wide range f supprt mechanisms ut there in the funding wrld, including grants, lans, equity investments, award schemes and
Most Significant Change
Click4it Wiki - Tlkit Mst Significant Change Step by Step Step 1: Starting and raising interest A. It may help t use ne f the fllwing metaphrs t explain the MSC: Newspaper: Newspapers are structured int
Develop Agency SPF From SafetyAnalystWiki
Develp Agency SPF Frm SafetyAnalystWiki Cntents [hide] 1 Safety Perfrmance Functins 1.1 What SPFs Are Needed 1.2 Functinal Frm f SPFs 1.3 Data Needs fr Develpment f SPFs 1.4 Statistical Assumptins and
THE INTERNATIONAL <IR> FRAMEWORK
THE INTERNATIONAL FRAMEWORK ABOUT THE IIRC The Internatinal Integrated Reprting Cuncil (IIRC) is a glbal calitin f regulatrs, investrs, cmpanies, standard setters, the accunting prfessin and NGOs.
European Investment Bank. Guide to Procurement
GUIDE TO PROCUREMENT fr prjects financed by the EIB Updated versin f June 2011 TABLE OF CONTENTS Intrductin 1. General Aspects...4 1.1. The Bank s Plicy... 4 1.2. Eligibility f Cntractrs and Suppliers
No Unsafe Lift. Workbook
N Unsafe Lift Wrkbk Cver and Sectin Break image prvided curtesy f Arj Canada Inc. Table Of Cntents Purpse f this wrkbk... 2 Hw t use this wrkbk...3 SECTION ONE A Brief Review f the Literature...5 SECTION
MEASURING AND/OR ESTIMATING SOCIAL VALUE CREATION: Insights Into Eight Integrated Cost Approaches
MEASURING AND/OR ESTIMATING SOCIAL VALUE CREATION: Insights Int Eight Integrated Cst Appraches Prepared fr Bill & Melinda Gates Fundatin Impact Planning and Imprvement Prepared by Melinda T. Tuan P.O.
ns Rev. 0 (3.9.15) Reporting water MDL is allowable) Preparatory Method Analysis Method The MDL programs and by covered The LOD reporting?
NR149 LOD/ /LOQ Clarificatin Required frequency Annually an MDL study must be perfrmed fr each cmbinatin f the fllwing: Matrix (if the slid and aqueus matrix methds are identical, extraplatin frm the water
How to Convert your Paper into a Presentation
Hw t Cnvert yur Paper int a Presentatin During yur cllege career, yu may be asked t present yur academic wrk in the classrm, at cnferences, r at special events. Tw types f talks are cmmn in academia: presentatins
Chapter 2 Getting Data into R
Chapter 2 Getting Data into R In the following chapter we address entering data into R and organising it as scalars (single values), vectors, matrices, data frames, or lists. We also demonstrate importing
Electronic Communication
Applicatin fr Tree Wrks: Wrks t Trees Subject t a Tree Preservatin Order (TPO) and/r Ntificatin f Prpsed Wrks t Trees in Cnservatin Areas (CA) Twn and Cuntry Planning Act 1990 Electrnic Cmmunicatin If
Social Media Use by Governments
Please cite this paper as: Mickleit, A. (2014), Scial Media Use by Gvernments: A Plicy Primer t Discuss Trends, Identify Plicy Opprtunities and Guide Decisin Makers, OECD Wrking Papers n Public Gvernance,
Report for the Food Standards Agency. Nutrition and Public Health Intervention Research Unit London School of Hygiene & Tropical Medicine
Cmparisn f cmpsitin (nutrients and ther substances) f rganically and cnventinally prduced fdstuffs: a systematic review f the available literature Reprt fr the Fd Standards Agency Nutritin and Public Health
RISING TO THE CHALLENGE. Re-Envisioning Public Libraries
RISING TO THE CHALLENGE Re-Envisining Public Libraries RISING TO THE CHALLENGE Re-Envisining Public Libraries A reprt f the Aspen Institute Dialgue n Public Libraries by Amy K. Garmer Directr Aspen Institute
Essendant Online Terms of Use
Essendant Online Terms f Use Thank yu fr visiting this website. These Terms f Use gvern yur use f any website wned by Essendant C. r any f its subsidiaries (including Essendant Industrial LLC), n which
1 IS THERE A CONTRACT?
1 IS THERE A CONTRACT? MANIFESTATION OF MUTUAL ASSENT: There must be an bjective manifestatin f mutual assent t a K. Judged by what a reasnable persn wuld understand the parties actins t mean. - At stake
The Data Center Management Elephant
The Data Center Management Elephant By David Cle DATA CENTER SOLUTIONS Fr Mre Infrmatin: (866) 787-3271 [email protected] 2010 N Limits Sftware. All rights reserved. N part f this publicatin may be used,
