2. Regression and Correlation. Simple Linear Regression Software: R



Similar documents
Statistical Models in R

Multiple Linear Regression

Correlation and Simple Linear Regression

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Comparing Nested Models

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

5. Linear Regression

Using R for Linear Regression

N-Way Analysis of Variance

Simple Linear Regression

Testing for Lack of Fit

Lucky vs. Unlucky Teams in Sports

MIXED MODEL ANALYSIS USING R

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Psychology 205: Research Methods in Psychology

Regression step-by-step using Microsoft Excel

Univariate Regression

Week 5: Multiple Linear Regression

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

BIOL 933 Lab 6 Fall Data Transformation

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Logistic Regression (a type of Generalized Linear Model)

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Exercise 1.12 (Pg )

Chapter 7: Simple linear regression Learning Objectives

Final Exam Practice Problem Answers

Two-way ANOVA and ANCOVA

Premaster Statistics Tutorial 4 Full solutions

MULTIPLE REGRESSION EXAMPLE

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Getting Correct Results from PROC REG

Time Series Analysis

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

n + n log(2π) + n log(rss/n)

Least Squares Regression. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić

STAT 350 Practice Final Exam Solution (Spring 2015)

2. Simple Linear Regression

Exchange Rate Regime Analysis for the Chinese Yuan

Chapter 3 Quantitative Demand Analysis

The importance of graphing the data: Anscombe s regression examples

Regression Analysis: A Complete Example

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

How Far is too Far? Statistical Outlier Detection

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Simple linear regression

Chicago Insurance Redlining - a complete example

Using R for Windows and Macintosh

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

2013 MBA Jump Start Program. Statistics Module Part 3

ANOVA. February 12, 2015

Lab 13: Logistic Regression

Correlation and Regression

Module 5: Multiple Regression Analysis

Statistical Models in R

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Not Your Dad s Magic Eight Ball

Simple Linear Regression Inference

5 Analysis of Variance models, complex linear models and Random effects models

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

SPSS Guide: Regression Analysis

Chapter 23. Inferences for Regression

Example: Boats and Manatees

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

GLM I An Introduction to Generalized Linear Models

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

Systat: Statistical Visualization Software

THE OPEN SOURCE SOFTWARE R IN THE STATISTICAL QUALITY CONTROL

Week TSX Index

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Regression Analysis (Spring, 2000)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

International Statistical Institute, 56th Session, 2007: Phil Everson

Multiple Regression: What Is It?

Normality Testing in Excel

Part II. Multiple Linear Regression

Chapter 7 Section 1 Homework Set A

Simple Methods and Procedures Used in Forecasting

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

1.5 Oneway Analysis of Variance

1.1. Simple Regression in Excel (Excel 2010).

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Homework 8 Solutions

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

R: A Free Software Project in Statistical Computing

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail:

Transcription:

2. Regression and Correlation Simple Linear Regression Software: R Create txt file from SAS data set data _null_; file 'C:\Documents and Settings\sphlab\Desktop\slr1.txt'; set temp; put input day:date7. calls fhigh flow high low rain snow weekday year sunday subzero; run; ##### You need to delete the dot signs at the beginning of each line######## 1.Read in data from text file data<-read.table("c:/documents and Settings/liyuan/Desktop/640TA/slr2.txt",header=T) attach(data) 2. Partical listing of output list(data) [[1]] day calls fhigh flow high low rain snow weekday year sunday subzero 1 12069 2298 38 31 39 31 0 0 0 0 0 0 2 12070 1709 41 27 41 30 0 0 0 0 1 0 3 12071 2395 33 26 38 24 0 0 0 0 0 0 4 12072 2486 29 19 36 21 0 0 1 0 0 0 5 12073 1849 40 19 43 27 0 0 1 0 0 0 6 12074 1842 44 30 43 29 0 0 1 0 0 0 7 12075 2100 46 40 53 41 1 0 1 0 0 0 8 12076 1752 47 35 46 40 0 0 0 0 0 0 9 12077 1776 53 34 55 38 1 0 0 0 1 0 10 12078 1812 38 32 43 31 0 0 1 0 0 0 11 12079 1842 35 21 35 25 0 0 1 0 0 0 12 12080 1674 39 27 44 31 1 1 1 0 0 0 13 12081 1692 34 28 40 27 0 0 1 0 0 0 3.Plot of calls over time par(mfrow=c(2,2)) plot(day,calls, xlim=c(12000,12500), ylim=c(1000,9000), xlab= Day,ylab= Calls, main= Calls to NY Auto Club 1993-1994,col= black ) \R_howto\simple linear regression ny auto club.doc Page 1 of 8

4. Tests of Assumption of Normality on Y=calls > mean(calls) [1] 4318.75 > length(calls) [1] 28 >sum(calls) [1] 120925 >var(calls) [1] 7249901 > sum(calls^2) ##uncorrected ss## [1] 717992159 > sum(((calls-mean(calls))^2) ) ##corrected ss## [1] 195747315 > 100*sd(calls)/mean(calls) ##Coefficient of variation## [1] 62.34591 > sd(calls)/sqrt(length(calls)) ##standard error mean## [1] 508.8468 \R_howto\simple linear regression ny auto club.doc Page 2 of 8

##########the package fbasic should be installed first for the following function####### > skewness(calls) [1] 0.4307614 attr(,"method") [1] "moment" > kurtosis(calls) [1] -1.497417 attr(,"method") [1] "excess ######the packages nortest and stats should be installed first for the following function####### >shapiro.test(calls) Shapiro-Wilk normality test data: calls W = 0.829, p-value = 0.0003628 > cvm.test(calls) Cramer-von Mises normality test data: calls W = 0.3112, p-value = 0.0002141 > ad.test(calls) Anderson-Darling normality test data: calls A = 1.8673, p-value = 6.68e-05 5. Graphical Assessments of Normality of Y=calls Histogram with overlay normal hist(calls,col='lightblue', main='histogram of calls', breaks=5, include.lowest = TRUE, right = TRUE,freq=F) points(calls,dnorm(calls,mean=mean(calls),sd=sqrt(var(calls))),col='red',lty=6) \R_howto\simple linear regression ny auto club.doc Page 3 of 8

Quantile Quantile Plot qqnorm(calls,datax=true, main= Simple Normal QQplot for Y=calls, ylab= Calls, xlab= Normal quantiles ) qqline(calls,datax=true) \R_howto\simple linear regression ny auto club.doc Page 4 of 8

qqnorm(calls,datax=true, main= Simple Normal QQplot for Y=calls ) qqline(calls,datax=true) Simple Normal QQplot for Y=calls Theoretical Quantiles -2-1 0 1 2 2000 4000 6000 8000 Sample Quantiles 6.Scatterplot of Y=Calls vs X=low calls0<-calls[year==0] calls1<-calls[year==1] low0<-low[year==0] low1<-low[year==1] plot(low0,calls0, main="calls to NY Auto Club 1993-1994",xlim=c(-10,50),ylim=c(1000,9000), xlab="low", ylab="calls", col= green ) points (low1,calls1, col="red") legend (35,9000, c( "1993:green","1994:red"), col=c("red","green") ) \R_howto\simple linear regression ny auto club.doc Page 5 of 8

7. Least Squares Estimation and Analysis of Variance Table lm1<-lm(calls~low) summary(lm1) coef(lm1) nova(lm1) Call: lm(formula = calls ~ low) Residuals: Min 1Q Median 3Q Max -3112.1-1467.6-214.0 1143.9 3587.9 Parameter Estimates Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 7475.85 704.63 10.610 6.10e-11 *** low -145.15 27.79-5.223 1.86e-05 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1917 on 26 degrees of freedom Multiple R-Squared: 0.5121, Adjusted R-squared: 0.4933 F-statistic: 27.28 on 1 and 26 DF, p-value: 1.865e-05 Analysis of Variance Response: calls Df Sum Sq Mean Sq F value Pr(>F) low 1 100233719 100233719 27.285 1.865e-05 *** Residuals 26 95513596 3673600 \R_howto\simple linear regression ny auto club.doc Page 6 of 8

8. Overlay of straight line fit onto scatterplot of Y=calls vs X=low abline(lm1) 9. Residuals analysis-assessment of Normality of Residuals qqnorm(lm1$residuals, main="normality of Residuals Y=CALLS v X=LOW") 10. Residuals Analysis Detection of Outliers Using Cook s Distance \R_howto\simple linear regression ny auto club.doc Page 7 of 8

plot.lm(lm1,which=4, main= Cook s Distance Values for Straight Line Y=Calls v X=Low ) 10. Residuals Analysis Detection of Outliers Using Cook s Distance Diag<- ls.diag(lm1) plot(lm1$fitted,diag$stud.res,ylim=c(-2.0,2.5),xlab="predicted Value",ylab="Studentized Residual",main="Jacknife Residuals versus Predicted") abline(h=0,lty=c(3)) Jacknife Residuals versus Predicted Studentized Residual -2-1 0 1 2 2000 3000 4000 5000 6000 7000 8000 Predicted Value \R_howto\simple linear regression ny auto club.doc Page 8 of 8