7. Analysis of Variance (ANOVA)

1 7. Analyss of Varance (ANOVA) 2 7.1 An overvew of ANOVA

What s ANOVA? 3 ANOVA refers to statstcal models and assocated procedures, n whch the observed varance s parttoned nto components due to dfferent explanatory varables. ANOVA was frst developed by R. A. Fsher n the 1920s and 1930s. Thus, t s also known as Fsher's analyss of varance, or Fsher's ANOVA. What does ANOVA do? 4 It provdes a statstcal test concernng f the means of several groups are all equal. In ts smplest form, ANOVA s equvalent to Student's t-test when only two groups are nvolved.

Types of ANOVA 5 One-way ANOVA --- nvolves only a sngle factor n the experment. two-way/multple-way ANOVA --- two or more factors are relevant. Factoral ANOVA --- there s replcaton at each combnaton of levels n a two way/mult-way ANOVA. Mxed-desgn ANOVA --- a factoral mxeddesgn, n whch one factor s a between-subjects varable and the other s wthn-subjects varable. Multvarate analyss of varance (MANOVA) --- more than one dependent varable nvolved n the analyss. Basc Assumptons 6 Independence cases are ndependent. Normalty data are normally dstrbuted n each of the groups. Homogenety of varances varance of data are the same n all the groups (Homoscedastcty). The above form the common assumpton that the errors are ndependently, dentcally, and normally dstrbuted for fxed-effect models.

LOGIC OF ANOVA (1) 7 The fundamental technque of ANOVA s to partton the total sum of squares nto components related to the effects nvolved n the model. SSY = SSA + SSE dfy = dfa + dfe MSA = SSA/dfA; MSE = SSE/dfE LOGIC OF ANOVA (2) 8 MSE s the pooled varance obtaned by combnng the ndvdual group varance, and thus t provdes an estmate of the populaton varance. MSA s also an estmate of n the absence of true group effects, but t ncludes a term related to dfferences between group means when there are group effects. Thus, a test for sgnfcant dfference between the group means can be performed by comparng the two varance estmates, that s, F = MSA/MSE

LOGIC OF ANOVA (3) 9 Under the null hypothess of dentcal means, the value of the F statstc s deally 1, but t s expected to have some varaton around that value. Statstcally, t s an F dstrbuton wth (k-1, n-k) degrees of freedom, assumng that all group means are equal. FOLLOW UP TESTS 10 If a statstcally sgnfcant effect s found n ANOVA, one or more tests of approprate knds wll follow up, n order to assess whch groups are dfferent from whch other groups or to test varous other focused hypotheses. For example, Tukey's test most commonly compare every group mean wth every other group mean and typcally ncorporate some methods to control Type I errors.

11 7.2 One-way ANOVA The data model 12 ( ) ( ) y = y + y y + y y j j y j = µ + α+ εj 2 where εj ~ N( 0, σ )

Decomposton of the total sum of squares 13 2 ( yj y) = n( y y) + ( yj y) 2 2 j j SSY = SSA + SSE Degrees of freedom 14 n 1 = ( k 1) + ( n k) dfy = dfa+ dfe

Mean squares and F statstc 15 SSA MSA= = dfa ( ) 2 n y y k 1 SSE MSE= = dfe ( y ) 2 j y j n k F = MSA MSE Example 16 The red cell folate data, descrbed by Altman (1991, p208) 22 observatons, a numerc varable folate and a factor ventlaton. Three level of ventlaton: N2Q+O2,24h, N2O+O2,op, and O2,24h. > attach(red.cell.folate) > str(red.cell.folate) 'data.frame': 22 obs. of 2 varables: $ folate : num 243 251 275 291 347 354 380 392 206 210... $ ventlaton: Factor w/ 3 levels "N2O+O2,24h","N2O+O2,op",..: 1 1 1 1 1 1 1 1 2 2...

ANOVA usng anova and lm 17 > anova(lm(folate~ventlaton)) Analyss of Varance Table Response: folate Df Sum Sq Mean Sq F value Pr(>F) ventlaton 2 15516 7758 3.7113 0.04359 * Resduals 19 39716 2090 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Interpretaton of regresson coeffcents 18 The regresson coeffcents for a factor varable do not have the usual meanng as the slope of a regresson analyss wth a numerc explanatory varable. > summary(lm(folate~ventlaton)) Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) 316.62 16.16 19.588 4.65e-14 *** ventlatonn2o+o2,op -60.18 22.22-2.709 0.0139 * ventlatono2,24h -38.62 26.06-1.482 0.1548 ---

Multple test problem 19 Consder k ndependent tests, T1, T2,, Tk, each wth a sgnfcance probablty, say, Pr(T) = α. The probablty that at least one of them comes out sgnfcant s Pr(T1+T2+ +Tk) Pr(T1) + Pr(T2) + + Pr(Tk) = nα. Suppose α=0.05, then the chance of havng at least one postve result n 10 test s up to 50%. Thus, the p-values tend to be exaggerated. Bonferron correcton 20 The Bonferron correcton s a method used to address the problem of multple comparsons by dvdng the sgnfcance level by the number of tests, or, equvalently, by multplyng the p- values by the number of test Let Pr(T1+T2+ +Tk) = α, where α s the sgnfcance level for the entre seres of tests. Let Pr(T1) = Pr(T2) = = Pr(Tk) = β. Then, α kβ, or β α / k.

Multple comparson 21 The functon parwse.t.test s avalable to carry out all possble two-group comparsons, and meanwhle makng adjustments for multple comparsons, e.g., va Bonferron correcton > parwse.t.test(folate,ventlaton, p.adj="bonferron") Parwse comparsons usng t tests wth pooled SD data: folate and ventlaton N2O+O2,24h N2O+O2,op N2O+O2,op 0.042 - O2,24h 0.464 1.000 P value adjustment method: bonferron Interpretaton of results by plots 22 200 250 300 350 N2O+O2,24h N2O+O2,op O2,24h

Testng of homogenety of varance (1) 23 > bartlett.test(folate~ventlaton) Bartlett test of homogenety of varances data: folate by ventlaton Bartlett's K-squared = 2.0951, df = 2, p-value = 0.3508 > flgner.test(folate~ventlaton) Flgner-Klleen test of homogenety of varances data: folate by ventlaton Flgner-Klleen:med ch-squared = 5.5244, df = 2, p-value = 0.06315 The Levene s test (1) 24 Insenstve to non-normalty; more approprate for testng of homogenety of varance. Compute the absolute values of the resduals from the orgnal lnear regresson analyss; Ft a lnear model by regressng these absolute resduals on the same set of explanatory varables; Sgnfcant group effects are ndcatve of volaton of the homoscedastcty assumpton.

The Levene s test (2) 25 > g<-lm(folate~ventlaton) > summary(lm(abs(g$res)~ventlaton)) Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) 51.625 6.673 7.737 2.74e-07 *** ventlatonn2o+o2,op -21.353 9.171-2.328 0.0311 * ventlatono2,24h -25.625 10.759-2.382 0.0278 * Dagnostcs of normalty 26 Normal Q-Q Plot Sample Quantles -50 0 50-2 -1 0 1 2 Theoretcal Quantles

27 7.3 Two-way ANOVA The data model 28 ( ) ( ) ( ) y j = µ + α+ βj + εj y = y + y y + y y + y y y + y j j j j

Decomposton of total sum of squares 29 SSY ( y ) 2 j y = j 2 ( ) ( j ) ( yj y yj y) j j 2 2 = n y y + m y y + + = SSA+ SSB+ SSE n y y SSA MSA= = dfa ( ) 2 m ( y ) 2 j y SSB j m 1 MSB= = dfb n 1 Mean squares & F statstc 30 SSA MSA= = dfa ( ) 2 n y y m 1 F = MSA/MSE ( ) 2 j j m y y SSB MSB= = dfb n 1 F = MSB/MSE SSE MSE= = dfe ( yj y y j+ y ) j ( m 1)( n 1) 2

Example --- data 31 > heart.rate <- data.frame( + hr = c(96,110,89,95,128,100,72,79,100, + 92,106,86,78,124,98,68,75,106, + 86,108,85,78,118,100,67,74,104, + 92,114,83,83,118,94,71,74,102), + subj=gl(9,1,36), + tme=gl(4,9,36,labels=c(0,30,60,120))) > str(heart.rate) 'data.frame': 36 obs. of 3 varables: $ hr : num 96 110 89 95 128 100 72 79 100 92... $ subj: Factor w/ 9 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 1... $ tme: Factor w/ 4 levels "0","30","60",..: 1 1 1 1 1 1 1 1 1 2... Two-way ANOVA 32 > anova(lm(hr~subj + tme)) Analyss of Varance Table Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj 8 8966.6 1120.8 90.6391 4.863e-16 *** tme 3 151.0 50.3 4.0696 0.01802 * Resduals 24 296.8 12.4 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1

33 7.4 ANOVA n regresson analyss Sum of squares 34 ( y y ) 2 SSY = SSM ( y y ) 2 = ˆ ( y yˆ ) 2 SSR=

Example 35 > attach(thuesen) > lm.thuesen <- lm(short.velocty~blood.glucose) > anova(lm.thuesen) Analyss of Varance Table Response: short.velocty Df Sum Sq Mean Sq F value Pr(>F) blood.glucose 1 0.20727 0.20727 4.414 0.0479 * Resduals 21 0.98610 0.04696 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 36 7.5 ANOVA for model selecton

Models & null hypothess 37 Full model: y= Xβ+ ε Reduced model: y= 1µ + ε Null hypothess: 0 1 βk 1 H : β =... = = 0 Sum of squares 38 SSY = ( y y) '( y y) ( ˆ) '( ˆ) SSR = εˆ ' εˆ = y Xβ y Xβ SSM = SSY - SSR

ANOVA table 39 Full model vs. reduced model 40 > gft4<-lm(speces~elevaton+nearest+scruz+adjacent,data=gala) > y<-as.vector(gala$speces) > SYY<-sum((y-mean(y))^2) > SYY [1] 381081.4 > RSS<-sum(gft4$res^2) > RSS [1] 93469.08 > F<-((SYY-RSS)/4)/(RSS/25) > F [1] 19.23178 > 1-pf(F,4,25) [1] 2.44953e-07

Comparng two models 41 > gft2<-lm(speces~elevaton+nearest,data=gala) > anova(gft4,gft2) Analyss of Varance Table Model 1: Speces ~ Elevaton + Nearest + Scruz + Adjacent Model 2: Speces ~ Elevaton + Nearest Res.Df RSS Df Sum of Sq F Pr(>F) 1 25 93469 2 27 173241-2 -79771 10.668 0.0004469 *** --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1