2. Regression and Correlation Simple Linear Regression Software: R Create txt file from SAS data set data _null_; file 'C:\Documents and Settings\sphlab\Desktop\slr1.txt'; set temp; put input day:date7. calls fhigh flow high low rain snow weekday year sunday subzero; run; ##### You need to delete the dot signs at the beginning of each line######## 1.Read in data from text file data<-read.table("c:/documents and Settings/liyuan/Desktop/640TA/slr2.txt",header=T) attach(data) 2. Partical listing of output list(data) [[1]] day calls fhigh flow high low rain snow weekday year sunday subzero 1 12069 2298 38 31 39 31 0 0 0 0 0 0 2 12070 1709 41 27 41 30 0 0 0 0 1 0 3 12071 2395 33 26 38 24 0 0 0 0 0 0 4 12072 2486 29 19 36 21 0 0 1 0 0 0 5 12073 1849 40 19 43 27 0 0 1 0 0 0 6 12074 1842 44 30 43 29 0 0 1 0 0 0 7 12075 2100 46 40 53 41 1 0 1 0 0 0 8 12076 1752 47 35 46 40 0 0 0 0 0 0 9 12077 1776 53 34 55 38 1 0 0 0 1 0 10 12078 1812 38 32 43 31 0 0 1 0 0 0 11 12079 1842 35 21 35 25 0 0 1 0 0 0 12 12080 1674 39 27 44 31 1 1 1 0 0 0 13 12081 1692 34 28 40 27 0 0 1 0 0 0 3.Plot of calls over time par(mfrow=c(2,2)) plot(day,calls, xlim=c(12000,12500), ylim=c(1000,9000), xlab= Day,ylab= Calls, main= Calls to NY Auto Club 1993-1994,col= black ) \R_howto\simple linear regression ny auto club.doc Page 1 of 8
4. Tests of Assumption of Normality on Y=calls > mean(calls) [1] 4318.75 > length(calls) [1] 28 >sum(calls) [1] 120925 >var(calls) [1] 7249901 > sum(calls^2) ##uncorrected ss## [1] 717992159 > sum(((calls-mean(calls))^2) ) ##corrected ss## [1] 195747315 > 100*sd(calls)/mean(calls) ##Coefficient of variation## [1] 62.34591 > sd(calls)/sqrt(length(calls)) ##standard error mean## [1] 508.8468 \R_howto\simple linear regression ny auto club.doc Page 2 of 8
##########the package fbasic should be installed first for the following function####### > skewness(calls) [1] 0.4307614 attr(,"method") [1] "moment" > kurtosis(calls) [1] -1.497417 attr(,"method") [1] "excess ######the packages nortest and stats should be installed first for the following function####### >shapiro.test(calls) Shapiro-Wilk normality test data: calls W = 0.829, p-value = 0.0003628 > cvm.test(calls) Cramer-von Mises normality test data: calls W = 0.3112, p-value = 0.0002141 > ad.test(calls) Anderson-Darling normality test data: calls A = 1.8673, p-value = 6.68e-05 5. Graphical Assessments of Normality of Y=calls Histogram with overlay normal hist(calls,col='lightblue', main='histogram of calls', breaks=5, include.lowest = TRUE, right = TRUE,freq=F) points(calls,dnorm(calls,mean=mean(calls),sd=sqrt(var(calls))),col='red',lty=6) \R_howto\simple linear regression ny auto club.doc Page 3 of 8
Quantile Quantile Plot qqnorm(calls,datax=true, main= Simple Normal QQplot for Y=calls, ylab= Calls, xlab= Normal quantiles ) qqline(calls,datax=true) \R_howto\simple linear regression ny auto club.doc Page 4 of 8
qqnorm(calls,datax=true, main= Simple Normal QQplot for Y=calls ) qqline(calls,datax=true) Simple Normal QQplot for Y=calls Theoretical Quantiles -2-1 0 1 2 2000 4000 6000 8000 Sample Quantiles 6.Scatterplot of Y=Calls vs X=low calls0<-calls[year==0] calls1<-calls[year==1] low0<-low[year==0] low1<-low[year==1] plot(low0,calls0, main="calls to NY Auto Club 1993-1994",xlim=c(-10,50),ylim=c(1000,9000), xlab="low", ylab="calls", col= green ) points (low1,calls1, col="red") legend (35,9000, c( "1993:green","1994:red"), col=c("red","green") ) \R_howto\simple linear regression ny auto club.doc Page 5 of 8
7. Least Squares Estimation and Analysis of Variance Table lm1<-lm(calls~low) summary(lm1) coef(lm1) nova(lm1) Call: lm(formula = calls ~ low) Residuals: Min 1Q Median 3Q Max -3112.1-1467.6-214.0 1143.9 3587.9 Parameter Estimates Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 7475.85 704.63 10.610 6.10e-11 *** low -145.15 27.79-5.223 1.86e-05 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1917 on 26 degrees of freedom Multiple R-Squared: 0.5121, Adjusted R-squared: 0.4933 F-statistic: 27.28 on 1 and 26 DF, p-value: 1.865e-05 Analysis of Variance Response: calls Df Sum Sq Mean Sq F value Pr(>F) low 1 100233719 100233719 27.285 1.865e-05 *** Residuals 26 95513596 3673600 \R_howto\simple linear regression ny auto club.doc Page 6 of 8
8. Overlay of straight line fit onto scatterplot of Y=calls vs X=low abline(lm1) 9. Residuals analysis-assessment of Normality of Residuals qqnorm(lm1$residuals, main="normality of Residuals Y=CALLS v X=LOW") 10. Residuals Analysis Detection of Outliers Using Cook s Distance \R_howto\simple linear regression ny auto club.doc Page 7 of 8
plot.lm(lm1,which=4, main= Cook s Distance Values for Straight Line Y=Calls v X=Low ) 10. Residuals Analysis Detection of Outliers Using Cook s Distance Diag<- ls.diag(lm1) plot(lm1$fitted,diag$stud.res,ylim=c(-2.0,2.5),xlab="predicted Value",ylab="Studentized Residual",main="Jacknife Residuals versus Predicted") abline(h=0,lty=c(3)) Jacknife Residuals versus Predicted Studentized Residual -2-1 0 1 2 2000 3000 4000 5000 6000 7000 8000 Predicted Value \R_howto\simple linear regression ny auto club.doc Page 8 of 8