Bob Muenchen, Author R for SAS and SPSS Users, Co-Author R for Stata Users muenchen.bob@gmail.com, http://r4stats.com What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 2 The most powerful statistical computing language on the planet. -Norman Nie, Developer of SPSS Language + package + environment for graphics and data analysis Free and open source Created by Ross Ihaka & Robert Gentleman 1996 & extended by many more An implementation of the S language by John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs 3 4
http://r4stats.com/popularity 5 6 Source: r4stats.com/popularity 1. Data input & management (data step) 2. Analytics & graphics procedures (proc step) 3. Macro language 4. Matrix language 5. Output management systems (ODS/OMS) * SAS Approach; DATA A; SET A; logx = log(x); PROC REG; MODEL Y = logx; R integrates these all seamlessly. # R Approach lm( Y ~ log(x) ) 7 8
Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel ) Its object orientation does the right thing Its language is powerful & fully integrated Procedures you write are on an equal footing It is the universal language of data analysis It runs on any computer Being open source, you can study and modify it It is free 9 10 * Using SAS; PROC TTEST DATA=classroom; CLASS gender; VAR score; # In R t.test(score ~ gender, data=classroom) t.test(posttest, pretest, paired=true, data=classroom) Language is somewhat harder to learn Help files are sparse & complex Must find R and its add-ons yourself Graphical user interfaces not as polished Most R functions hold data in main memory Rule-of-thumb: 10 million values per gigabyte SAS/SPSS: billions of records Several efforts underway to break R s memory limit including Revolution Analytics distribution 11 12
Base R plus Recommended Packages like: Base SAS, SAS/STAT, SAS/GRAPH, SAS/IML Studio SPSS Stat. Base, SPSS Stat. Advanced, Regression Tested via extensive validation programs But add-on packages written by Professor who invented the method? A student interpreting the method? Email support is free, quick, 24-hours: www.r-project.org/mail.html Stackoverflow.com Quora.com Crossvalidated stats.stackexchange.com /questions/tagged/r Phone support available commercially 13 14 1. Go to cran.r-project.org, the Comprehensive RArchive Network 2. Download binaries for Base & run 3. Add-ons: install.packages( mypackage ) 4. To update: update.packages() Comprehensive R Archive Network Crantastic.com Inside-R.org R4Stats.com 15 16
17 18 19 20
Run code interactively Submit code from Excel, SAS, SPSS, Point-n-click using Graphical User Interfaces (GUIs) Batch mode 21 22 23 24
25 Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 26 run ExportDataSetToR("mydata"); submit/r; mydata$workshop <- factor(mydata$workshop) summary(mydata) endsubmit; GET FILE= mydata.sav. BEGIN PROGRAM R. mydata <- spssdata.getdatafromspss( variables = c("workshop gender q1 to q4"), missingvaluetona = TRUE, row.label = "id" ) summary(mydata) END PROGRAM. 27 28
29 30 31 32
33 34 A company focused on R development & support Run by SPSS founder Norman Nie Their enhanced distribution of R: Revolution R Enterprise Free for colleges and universities, including for outside consulting 35
43 44
mydata <- read.csv("mydata.csv") print(mydata) mydata$workshop <- factor(mydata$workshop) summary(mydata) plot( mydata$q1, mydata$q4 ) mymodel <- lm( q4~q1+q2+q3, data=mydata ) summary( mymodel ) anova( mymodel ) plot( mymodel ) > mydata <- read.csv("mydata.csv") > print(mydata) workshop gender q1 q2 q3 q4 1 1 f 1 1 5 1 2 2 f 2 1 4 1 3 1 f 2 2 4 3 4 2 <NA> 3 1 NA 3 5 1 m 4 5 2 4 6 2 m 5 4 5 5 7 1 m 5 3 4 4 8 2 m 4 5 5 5 45 46 > mydata$workshop <-factor(mydata$workshop) > summary(mydata) workshop gender 1:4 f :3 2:4 m :4 NA's:1 q1 q2 q3 q4 Min. :1.00 Min. :1.00 Min. :2.000 Min. :1.00 1st Qu.:2.00 1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50 Median :3.50 Median :2.50 Median :4.000 Median :3.50 Mean :3.25 Mean :2.75 Mean :4.143 Mean :3.25 3rd Qu.:4.25 3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25 Max. :5.00 Max. :5.00 Max. :5.000 Max. :5.00 NA's :1.000 47 48
> mymodel <- lm(q4 ~ q1+q2+q3, data=mydata) > summary(mymodel) Call: lm(formula = q4 ~ q1 + q2 + q3, data = mydata) Residuals: 1 2 3 5 6 7 8-0.3113-0.4261 0.9428-0.1797 0.0765 0.0225-0.1246 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -1.3243 1.2877-1.028 0.379 q1 0.4297 0.2623 1.638 0.200 q2 0.6310 0.2503 2.521 0.086 q3 0.3150 0.2557 1.232 0.306 Multiple R-squared: 0.9299, Adjusted R-squared: 0.8598 F-statistic: 13.27 on 3 and 3 DF, p-value: 0.03084 49 Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 50 51 52
R for SAS and SPSS Users, Muenchen R for Stata Users, Muenchen & Hilbe R Through Excel: A Spreadsheet Interface for Statistics, Data Analysis, and Graphics, Heiberger & Neuwirth Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Williams 53 54 R is powerful, extensible, free Download it from CRAN Academics download Revolution R Enterprise for free at www.revolutionanalytics.com You run it many ways & from many packages Several graphical user interfaces are available R's programming language is the way to access its full power muenchen@utk.edu Slides: r4stats.com/misc/webinar Presentation: bit.ly/r-sas-spss 55