7. Analysis of Variance (ANOVA)

Similar documents
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

STATISTICAL DATA ANALYSIS IN EXCEL

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

SIMPLE LINEAR CORRELATION

CHAPTER 14 MORE ABOUT REGRESSION

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Economic Interpretation of Regression. Theory and Applications

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

How To Calculate The Accountng Perod Of Nequalty

Testing The Torah Code Hypothesis: The Experimental Protocol

Analysis of Premium Liabilities for Australian Lines of Business

1 De nitions and Censoring

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

N-Way Analysis of Variance

Evaluating credit risk models: A critique and a new proposal

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Psicológica Universidad de Valencia ISSN (Versión impresa): ISSN (Versión en línea): ESPAÑA

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Statistical Methods to Develop Rating Models

14.74 Lecture 5: Health (2)

Calculation of Sampling Weights

1. Measuring association using correlation and regression

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

L10: Linear discriminants analysis

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Alternative Way to Measure Private Equity Performance

Quantization Effects in Digital Filters

7 ANALYSIS OF VARIANCE (ANOVA)

International University of Japan Public Management & Policy Analysis Program

Meta-Analysis of Hazard Ratios

Survival analysis methods in Insurance Applications in car insurance contracts

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Is There A Tradeoff between Employer-Provided Health Insurance and Wages?

Single and multiple stage classifiers implementing logistic discrimination

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Binomial Link Functions. Lori Murray, Phil Munz

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Although ordinary least-squares (OLS) regression

What is Candidate Sampling

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Marginal Returns to Education For Teachers

Statistical algorithms in Review Manager 5

ENVIRONMENTAL MONITORING Vol. II - Statistical Analysis and Quality Assurance of Monitoring Data - Iris Yeung

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Evaluating the generalizability of an RCT using electronic health records data

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

1.2 DISTRIBUTIONS FOR CATEGORICAL DATA

World currency options market efficiency

Regression Models for a Binary Response Using EXCEL and JMP

= 6degrees of freedom, if the test statistic value f = 4.53, then P-value =.

Richard W. Andrews and William C. Birdsall, University of Michigan Richard W. Andrews, Michigan Business School, Ann Arbor, MI

Data Visualization by Pairwise Distortion Minimization

Transition Matrix Models of Consumer Credit Ratings

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Recurrence. 1 Definitions and main statements

Forecasting the Direction and Strength of Stock Market Movement

DEFINING %COMPLETE IN MICROSOFT PROJECT

Traffic-light a stress test for life insurance provisions

Media Mix Modeling vs. ANCOVA. An Analytical Debate

4 Hypothesis testing in the multiple regression model

Financial Instability and Life Insurance Demand + Mahito Okura *

Adaptive Clinical Trials Incorporating Treatment Selection and Evaluation: Methodology and Applications in Multiple Sclerosis

Conceptual and Practical Issues in the Statistical Design and Analysis of Usability Tests

BERNSTEIN POLYNOMIALS

Sketching Sampled Data Streams

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

General Iteration Algorithm for Classification Ratemaking

Realistic Image Synthesis

5 Multiple regression analysis with qualitative information

Chapter 5 Analysis of variance SPSS Analysis of variance

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

Microarray data normalization and transformation

Stress test for measuring insurance risks in non-life insurance

Multiple Linear Regression

Quantification of qualitative data: the case of the Central Bank of Armenia

Wage inequality and returns to schooling in Europe: a semi-parametric approach using EU-SILC data

The announcement effect on mean and variance for underwritten and non-underwritten SEOs

World Economic Vulnerability Monitor (WEVUM) Trade shock analysis

The Application of Fractional Brownian Motion in Option Pricing

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

Meta-analysis in Psychological Research.

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Online Appendix Supplemental Material for Market Microstructure Invariance: Empirical Hypotheses

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

ADVERSE SELECTION IN INSURANCE MARKETS: POLICYHOLDER EVIDENCE FROM THE U.K. ANNUITY MARKET *

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

A Practitioner's Guide to Generalized Linear Models

Design and Analysis of Benchmarking Experiments for Distributed Internet Services

Forecasting and Stress Testing Credit Card Default using Dynamic Models

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

HÜCKEL MOLECULAR ORBITAL THEORY

Transcription:

1 7. Analyss of Varance (ANOVA) 2 7.1 An overvew of ANOVA

What s ANOVA? 3 ANOVA refers to statstcal models and assocated procedures, n whch the observed varance s parttoned nto components due to dfferent explanatory varables. ANOVA was frst developed by R. A. Fsher n the 1920s and 1930s. Thus, t s also known as Fsher's analyss of varance, or Fsher's ANOVA. What does ANOVA do? 4 It provdes a statstcal test concernng f the means of several groups are all equal. In ts smplest form, ANOVA s equvalent to Student's t-test when only two groups are nvolved.

Types of ANOVA 5 One-way ANOVA --- nvolves only a sngle factor n the experment. two-way/multple-way ANOVA --- two or more factors are relevant. Factoral ANOVA --- there s replcaton at each combnaton of levels n a two way/mult-way ANOVA. Mxed-desgn ANOVA --- a factoral mxeddesgn, n whch one factor s a between-subjects varable and the other s wthn-subjects varable. Multvarate analyss of varance (MANOVA) --- more than one dependent varable nvolved n the analyss. Basc Assumptons 6 Independence cases are ndependent. Normalty data are normally dstrbuted n each of the groups. Homogenety of varances varance of data are the same n all the groups (Homoscedastcty). The above form the common assumpton that the errors are ndependently, dentcally, and normally dstrbuted for fxed-effect models.

LOGIC OF ANOVA (1) 7 The fundamental technque of ANOVA s to partton the total sum of squares nto components related to the effects nvolved n the model. SSY = SSA + SSE dfy = dfa + dfe MSA = SSA/dfA; MSE = SSE/dfE LOGIC OF ANOVA (2) 8 MSE s the pooled varance obtaned by combnng the ndvdual group varance, and thus t provdes an estmate of the populaton varance. MSA s also an estmate of n the absence of true group effects, but t ncludes a term related to dfferences between group means when there are group effects. Thus, a test for sgnfcant dfference between the group means can be performed by comparng the two varance estmates, that s, F = MSA/MSE

LOGIC OF ANOVA (3) 9 Under the null hypothess of dentcal means, the value of the F statstc s deally 1, but t s expected to have some varaton around that value. Statstcally, t s an F dstrbuton wth (k-1, n-k) degrees of freedom, assumng that all group means are equal. FOLLOW UP TESTS 10 If a statstcally sgnfcant effect s found n ANOVA, one or more tests of approprate knds wll follow up, n order to assess whch groups are dfferent from whch other groups or to test varous other focused hypotheses. For example, Tukey's test most commonly compare every group mean wth every other group mean and typcally ncorporate some methods to control Type I errors.

11 7.2 One-way ANOVA The data model 12 ( ) ( ) y = y + y y + y y j j y j = µ + α+ εj 2 where εj ~ N( 0, σ )

Decomposton of the total sum of squares 13 2 ( yj y) = n( y y) + ( yj y) 2 2 j j SSY = SSA + SSE Degrees of freedom 14 n 1 = ( k 1) + ( n k) dfy = dfa+ dfe

Mean squares and F statstc 15 SSA MSA= = dfa ( ) 2 n y y k 1 SSE MSE= = dfe ( y ) 2 j y j n k F = MSA MSE Example 16 The red cell folate data, descrbed by Altman (1991, p208) 22 observatons, a numerc varable folate and a factor ventlaton. Three level of ventlaton: N2Q+O2,24h, N2O+O2,op, and O2,24h. > attach(red.cell.folate) > str(red.cell.folate) 'data.frame': 22 obs. of 2 varables: $ folate : num 243 251 275 291 347 354 380 392 206 210... $ ventlaton: Factor w/ 3 levels "N2O+O2,24h","N2O+O2,op",..: 1 1 1 1 1 1 1 1 2 2...

ANOVA usng anova and lm 17 > anova(lm(folate~ventlaton)) Analyss of Varance Table Response: folate Df Sum Sq Mean Sq F value Pr(>F) ventlaton 2 15516 7758 3.7113 0.04359 * Resduals 19 39716 2090 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Interpretaton of regresson coeffcents 18 The regresson coeffcents for a factor varable do not have the usual meanng as the slope of a regresson analyss wth a numerc explanatory varable. > summary(lm(folate~ventlaton)) Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) 316.62 16.16 19.588 4.65e-14 *** ventlatonn2o+o2,op -60.18 22.22-2.709 0.0139 * ventlatono2,24h -38.62 26.06-1.482 0.1548 ---

Multple test problem 19 Consder k ndependent tests, T1, T2,, Tk, each wth a sgnfcance probablty, say, Pr(T) = α. The probablty that at least one of them comes out sgnfcant s Pr(T1+T2+ +Tk) Pr(T1) + Pr(T2) + + Pr(Tk) = nα. Suppose α=0.05, then the chance of havng at least one postve result n 10 test s up to 50%. Thus, the p-values tend to be exaggerated. Bonferron correcton 20 The Bonferron correcton s a method used to address the problem of multple comparsons by dvdng the sgnfcance level by the number of tests, or, equvalently, by multplyng the p- values by the number of test Let Pr(T1+T2+ +Tk) = α, where α s the sgnfcance level for the entre seres of tests. Let Pr(T1) = Pr(T2) = = Pr(Tk) = β. Then, α kβ, or β α / k.

Multple comparson 21 The functon parwse.t.test s avalable to carry out all possble two-group comparsons, and meanwhle makng adjustments for multple comparsons, e.g., va Bonferron correcton > parwse.t.test(folate,ventlaton, p.adj="bonferron") Parwse comparsons usng t tests wth pooled SD data: folate and ventlaton N2O+O2,24h N2O+O2,op N2O+O2,op 0.042 - O2,24h 0.464 1.000 P value adjustment method: bonferron Interpretaton of results by plots 22 200 250 300 350 N2O+O2,24h N2O+O2,op O2,24h

Testng of homogenety of varance (1) 23 > bartlett.test(folate~ventlaton) Bartlett test of homogenety of varances data: folate by ventlaton Bartlett's K-squared = 2.0951, df = 2, p-value = 0.3508 > flgner.test(folate~ventlaton) Flgner-Klleen test of homogenety of varances data: folate by ventlaton Flgner-Klleen:med ch-squared = 5.5244, df = 2, p-value = 0.06315 The Levene s test (1) 24 Insenstve to non-normalty; more approprate for testng of homogenety of varance. Compute the absolute values of the resduals from the orgnal lnear regresson analyss; Ft a lnear model by regressng these absolute resduals on the same set of explanatory varables; Sgnfcant group effects are ndcatve of volaton of the homoscedastcty assumpton.

The Levene s test (2) 25 > g<-lm(folate~ventlaton) > summary(lm(abs(g$res)~ventlaton)) Coeffcents: Estmate Std. Error t value Pr(> t ) (Intercept) 51.625 6.673 7.737 2.74e-07 *** ventlatonn2o+o2,op -21.353 9.171-2.328 0.0311 * ventlatono2,24h -25.625 10.759-2.382 0.0278 * Dagnostcs of normalty 26 Normal Q-Q Plot Sample Quantles -50 0 50-2 -1 0 1 2 Theoretcal Quantles

27 7.3 Two-way ANOVA The data model 28 ( ) ( ) ( ) y j = µ + α+ βj + εj y = y + y y + y y + y y y + y j j j j

Decomposton of total sum of squares 29 SSY ( y ) 2 j y = j 2 ( ) ( j ) ( yj y yj y) j j 2 2 = n y y + m y y + + = SSA+ SSB+ SSE n y y SSA MSA= = dfa ( ) 2 m ( y ) 2 j y SSB j m 1 MSB= = dfb n 1 Mean squares & F statstc 30 SSA MSA= = dfa ( ) 2 n y y m 1 F = MSA/MSE ( ) 2 j j m y y SSB MSB= = dfb n 1 F = MSB/MSE SSE MSE= = dfe ( yj y y j+ y ) j ( m 1)( n 1) 2

Example --- data 31 > heart.rate <- data.frame( + hr = c(96,110,89,95,128,100,72,79,100, + 92,106,86,78,124,98,68,75,106, + 86,108,85,78,118,100,67,74,104, + 92,114,83,83,118,94,71,74,102), + subj=gl(9,1,36), + tme=gl(4,9,36,labels=c(0,30,60,120))) > str(heart.rate) 'data.frame': 36 obs. of 3 varables: $ hr : num 96 110 89 95 128 100 72 79 100 92... $ subj: Factor w/ 9 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 1... $ tme: Factor w/ 4 levels "0","30","60",..: 1 1 1 1 1 1 1 1 1 2... Two-way ANOVA 32 > anova(lm(hr~subj + tme)) Analyss of Varance Table Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj 8 8966.6 1120.8 90.6391 4.863e-16 *** tme 3 151.0 50.3 4.0696 0.01802 * Resduals 24 296.8 12.4 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1

33 7.4 ANOVA n regresson analyss Sum of squares 34 ( y y ) 2 SSY = SSM ( y y ) 2 = ˆ ( y yˆ ) 2 SSR=

Example 35 > attach(thuesen) > lm.thuesen <- lm(short.velocty~blood.glucose) > anova(lm.thuesen) Analyss of Varance Table Response: short.velocty Df Sum Sq Mean Sq F value Pr(>F) blood.glucose 1 0.20727 0.20727 4.414 0.0479 * Resduals 21 0.98610 0.04696 --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 36 7.5 ANOVA for model selecton

Models & null hypothess 37 Full model: y= Xβ+ ε Reduced model: y= 1µ + ε Null hypothess: 0 1 βk 1 H : β =... = = 0 Sum of squares 38 SSY = ( y y) '( y y) ( ˆ) '( ˆ) SSR = εˆ ' εˆ = y Xβ y Xβ SSM = SSY - SSR

ANOVA table 39 Full model vs. reduced model 40 > gft4<-lm(speces~elevaton+nearest+scruz+adjacent,data=gala) > y<-as.vector(gala$speces) > SYY<-sum((y-mean(y))^2) > SYY [1] 381081.4 > RSS<-sum(gft4$res^2) > RSS [1] 93469.08 > F<-((SYY-RSS)/4)/(RSS/25) > F [1] 19.23178 > 1-pf(F,4,25) [1] 2.44953e-07

Comparng two models 41 > gft2<-lm(speces~elevaton+nearest,data=gala) > anova(gft4,gft2) Analyss of Varance Table Model 1: Speces ~ Elevaton + Nearest + Scruz + Adjacent Model 2: Speces ~ Elevaton + Nearest Res.Df RSS Df Sum of Sq F Pr(>F) 1 25 93469 2 27 173241-2 -79771 10.668 0.0004469 *** --- Sgnf. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1