What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More



Similar documents
Multiple Linear Regression

R: A Free Software Project in Statistical Computing

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

R: A Free Software Project in Statistical Computing

Using R for Windows and Macintosh

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Psychology 205: Research Methods in Psychology

Statistical Models in R

Comparing Nested Models

MIXED MODEL ANALYSIS USING R

Data Science with R. Introducing Data Mining with Rattle and R.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

ECON 424/CFRM 462 Introduction to Computational Finance and Financial Econometrics

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Introduction to R software for statistical computing

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Using R for Linear Regression

Regression step-by-step using Microsoft Excel

How To Understand Data Mining In R And Rattle

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Downloading, Configuring, and Using the Free SAS University Edition Software

Correlation and Simple Linear Regression

Installing R and the psych package

R a Global Sensation in Data Science

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

ANOVA. February 12, 2015

data visualization and regression

Generalized Linear Models

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

Getting Correct Results from PROC REG

N-Way Analysis of Variance

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Find the Hidden Signal in Market Data Noise

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

How do most businesses analyze data?

Interaction effects between continuous variables (Optional)

Chapter 3 Quantitative Demand Analysis

xtmixed & denominator degrees of freedom: myth or magic

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

5. Linear Regression

SPSS Guide: Regression Analysis

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

2015 Workshops for Professors

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Didacticiel - Études de cas

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Big Data Analysis with Revolution R Enterprise

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Week 5: Multiple Linear Regression

Psychology 405: Psychometric Theory Homework on Factor analysis and structural equation modeling

Basic Statistical and Modeling Procedures Using SAS

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Lucky vs. Unlucky Teams in Sports

Chapter 7: Simple linear regression Learning Objectives

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

THE OPEN SOURCE SOFTWARE R IN THE STATISTICAL QUALITY CONTROL

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign

1.1. Simple Regression in Excel (Excel 2010).

Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

50 Cragwood Rd, Suite 350 South Plainfield, NJ Victoria Commons, 613 Hope Rd Building #5, Eatontown, NJ 07724

An introduction to using Microsoft Excel for quantitative data analysis

R FOR SAS AND SPSS USERS. Bob Muenchen

HLM software has been one of the leading statistical packages for hierarchical

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Big Data Analysis with Revolution R Enterprise

Electronic Thesis and Dissertations UCLA

Importing Data into R

2. Making example missing-value datasets: MCAR, MAR, and MNAR

Notes. Statistical consulting is like a final exam on steroids.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

This chapter reviews the general issues involving data analysis and introduces

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić

Using Excel for Statistical Analysis

Time Series Analysis of Aviation Data

n + n log(2π) + n log(rss/n)

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

High-Frequency Data Modelling with R software

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Business Valuation Review

Using R and the psych package to find ω

Advanced Big Data Analytics with R and Hadoop

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Final Exam Practice Problem Answers

Time-Series Regression and Generalized Least Squares in R

Can Annuity Purchase Intentions Be Influenced?

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Introduction to Longitudinal Data Analysis

Please follow these guidelines when preparing your answers:

Adverse Impact and Test Validation Book Series: Multiple Regression. Introduction. Comparison of Compensation using

R language in data mining techniques and statistics

Interactions involving Categorical Predictors

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Prediction Analysis of Microarrays in Excel

Transcription:

Bob Muenchen, Author R for SAS and SPSS Users, Co-Author R for Stata Users muenchen.bob@gmail.com, http://r4stats.com What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 2 The most powerful statistical computing language on the planet. -Norman Nie, Developer of SPSS Language + package + environment for graphics and data analysis Free and open source Created by Ross Ihaka & Robert Gentleman 1996 & extended by many more An implementation of the S language by John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs 3 4

http://r4stats.com/popularity 5 6 Source: r4stats.com/popularity 1. Data input & management (data step) 2. Analytics & graphics procedures (proc step) 3. Macro language 4. Matrix language 5. Output management systems (ODS/OMS) * SAS Approach; DATA A; SET A; logx = log(x); PROC REG; MODEL Y = logx; R integrates these all seamlessly. # R Approach lm( Y ~ log(x) ) 7 8

Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel ) Its object orientation does the right thing Its language is powerful & fully integrated Procedures you write are on an equal footing It is the universal language of data analysis It runs on any computer Being open source, you can study and modify it It is free 9 10 * Using SAS; PROC TTEST DATA=classroom; CLASS gender; VAR score; # In R t.test(score ~ gender, data=classroom) t.test(posttest, pretest, paired=true, data=classroom) Language is somewhat harder to learn Help files are sparse & complex Must find R and its add-ons yourself Graphical user interfaces not as polished Most R functions hold data in main memory Rule-of-thumb: 10 million values per gigabyte SAS/SPSS: billions of records Several efforts underway to break R s memory limit including Revolution Analytics distribution 11 12

Base R plus Recommended Packages like: Base SAS, SAS/STAT, SAS/GRAPH, SAS/IML Studio SPSS Stat. Base, SPSS Stat. Advanced, Regression Tested via extensive validation programs But add-on packages written by Professor who invented the method? A student interpreting the method? Email support is free, quick, 24-hours: www.r-project.org/mail.html Stackoverflow.com Quora.com Crossvalidated stats.stackexchange.com /questions/tagged/r Phone support available commercially 13 14 1. Go to cran.r-project.org, the Comprehensive RArchive Network 2. Download binaries for Base & run 3. Add-ons: install.packages( mypackage ) 4. To update: update.packages() Comprehensive R Archive Network Crantastic.com Inside-R.org R4Stats.com 15 16

17 18 19 20

Run code interactively Submit code from Excel, SAS, SPSS, Point-n-click using Graphical User Interfaces (GUIs) Batch mode 21 22 23 24

25 Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 26 run ExportDataSetToR("mydata"); submit/r; mydata$workshop <- factor(mydata$workshop) summary(mydata) endsubmit; GET FILE= mydata.sav. BEGIN PROGRAM R. mydata <- spssdata.getdatafromspss( variables = c("workshop gender q1 to q4"), missingvaluetona = TRUE, row.label = "id" ) summary(mydata) END PROGRAM. 27 28

29 30 31 32

33 34 A company focused on R development & support Run by SPSS founder Norman Nie Their enhanced distribution of R: Revolution R Enterprise Free for colleges and universities, including for outside consulting 35

43 44

mydata <- read.csv("mydata.csv") print(mydata) mydata$workshop <- factor(mydata$workshop) summary(mydata) plot( mydata$q1, mydata$q4 ) mymodel <- lm( q4~q1+q2+q3, data=mydata ) summary( mymodel ) anova( mymodel ) plot( mymodel ) > mydata <- read.csv("mydata.csv") > print(mydata) workshop gender q1 q2 q3 q4 1 1 f 1 1 5 1 2 2 f 2 1 4 1 3 1 f 2 2 4 3 4 2 <NA> 3 1 NA 3 5 1 m 4 5 2 4 6 2 m 5 4 5 5 7 1 m 5 3 4 4 8 2 m 4 5 5 5 45 46 > mydata$workshop <-factor(mydata$workshop) > summary(mydata) workshop gender 1:4 f :3 2:4 m :4 NA's:1 q1 q2 q3 q4 Min. :1.00 Min. :1.00 Min. :2.000 Min. :1.00 1st Qu.:2.00 1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50 Median :3.50 Median :2.50 Median :4.000 Median :3.50 Mean :3.25 Mean :2.75 Mean :4.143 Mean :3.25 3rd Qu.:4.25 3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25 Max. :5.00 Max. :5.00 Max. :5.000 Max. :5.00 NA's :1.000 47 48

> mymodel <- lm(q4 ~ q1+q2+q3, data=mydata) > summary(mymodel) Call: lm(formula = q4 ~ q1 + q2 + q3, data = mydata) Residuals: 1 2 3 5 6 7 8-0.3113-0.4261 0.9428-0.1797 0.0765 0.0225-0.1246 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -1.3243 1.2877-1.028 0.379 q1 0.4297 0.2623 1.638 0.200 q2 0.6310 0.2503 2.521 0.086 q3 0.3150 0.2557 1.232 0.306 Multiple R-squared: 0.9299, Adjusted R-squared: 0.8598 F-statistic: 13.27 on 3 and 3 DF, p-value: 0.03084 49 Copyright 2010, 2011, Robert A Muenchen. All rights reserved. 50 51 52

R for SAS and SPSS Users, Muenchen R for Stata Users, Muenchen & Hilbe R Through Excel: A Spreadsheet Interface for Statistics, Data Analysis, and Graphics, Heiberger & Neuwirth Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Williams 53 54 R is powerful, extensible, free Download it from CRAN Academics download Revolution R Enterprise for free at www.revolutionanalytics.com You run it many ways & from many packages Several graphical user interfaces are available R's programming language is the way to access its full power muenchen@utk.edu Slides: r4stats.com/misc/webinar Presentation: bit.ly/r-sas-spss 55