How to calculate an ANOVA table



Similar documents
Recall this chart that showed how most of our course would be organized:

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 7. One-way ANOVA

Chapter 5 Analysis of variance SPSS Analysis of variance

Section 13, Part 1 ANOVA. Analysis Of Variance

2013 MBA Jump Start Program. Statistics Module Part 3

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Statistical Models in R

One-Way Analysis of Variance (ANOVA) Example Problem

Regression step-by-step using Microsoft Excel

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

1.5 Oneway Analysis of Variance

2. Simple Linear Regression

Testing for Lack of Fit

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

individualdifferences

12: Analysis of Variance. Introduction

Multiple Linear Regression

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

2 Sample t-test (unequal sample sizes and unequal variances)

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Elementary Statistics Sample Exam #3

One-Way Analysis of Variance

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

SPSS Guide: Regression Analysis

Multivariate Analysis of Variance (MANOVA): I. Theory

An analysis method for a quantitative outcome and two categorical explanatory variables.

The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Chapter 19 The Chi-Square Test

General Regression Formulae ) (N-2) (1 - r 2 YX

Statistical Models in R

Final Exam Practice Problem Answers

Data Analysis Tools. Tools for Summarizing Data

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

Introduction to General and Generalized Linear Models

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

Regression Analysis: A Complete Example

1 Basic ANOVA concepts

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Univariate Regression

The ANOVA for 2x2 Independent Groups Factorial Design

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Main Effects and Interactions

Additional sources Compilation of sources:

MULTIPLE REGRESSION WITH CATEGORICAL DATA

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Experimental Designs (revisited)

Psychology 205: Research Methods in Psychology

Statistics Review PSY379

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Is it statistically significant? The chi-square test

Multiple Regression Analysis A Case Study

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

ANOVA. February 12, 2015

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010


MULTIPLE REGRESSION EXAMPLE

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Analysis of Variance. MINITAB User s Guide 2 3-1

Outline. RCBD: examples and model Estimates, ANOVA table and f-tests Checking assumptions RCBD with subsampling: Model

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Comparing Nested Models

CS 147: Computer Systems Performance Analysis

SPLIT PLOT DESIGN 2 A 2 B 1 B 1 A 1 B 1 A 2 B 2 A 1 B 1 A 2 A 2 B 2 A 2 B 1 A 1 B. Mathematical Model - Split Plot

Supplementary Materials. Dispersive micro-solid-phase extraction of dopamine, epinephrine and norepinephrine

Normality Testing in Excel

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Week TSX Index

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

MIXED MODEL ANALYSIS USING R

N-Way Analysis of Variance

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Simple Linear Regression Inference

5. Linear Regression

The Analysis of Variance ANOVA

1 Simple Linear Regression I Least Squares Estimation

Multivariate Analysis of Variance (MANOVA)

1.1. Simple Regression in Excel (Excel 2010).

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

CHAPTER 13. Experimental Design and Analysis of Variance

Pearson s Correlation

Linear Discriminant Analysis

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Transcription:

How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean  i X 1 2 2...... Y 5 6 5...... Z 2 1...... Overall mean //... STEP 0: The model: Y ij = µ + A i + ɛ ij (0.1) n i A i = 0 (0.2) i An observation y ij is given by: the average height of the plants (µ), plus the effect of the fertilizer (A i ). and an error term (ɛ ij ), i.e. every seed is different and therefore any plant will be different. All these values (µ, A i, ɛ ij ) are UNKNOWN! Our GOAL is to test if the hypothesis A 1 = A 2 = A = 0 is plausible 1. Remark 1 If we have a control group (for example treatment X is without any fertilizer, then we assume that the values of X are in some way the best approximation for µ, therefore we can choose A 1 = 0 is spite of condition (0.2). STEP 1: complete the first table. For the treatment means it is enough to calculate the mean of the values Mean X = 1 + 2 + 2 Mean Y = 5 + 6 + 5 Mean Z = 1 + 2 2 = 1.5 = 1.667 = 5. 1 We DO NOT find the correct value for the A i We WILL NOT find which factor (treatment) has an effect, we just look if in general treatments has effect on the results. 1

The (estimated) overall mean (ˆµ, which is an estimation of the exact, unknown overall mean µ) is calculated as follows 2 : ˆµ = 1 + 2 + 2 + 5 + 6 + 5 + 2 + 1 8 = The estimated effects Âi are the difference between the estimated treatment mean and the estimated overall mean, i.e. So  i = Mean i ˆµ  1 = 1.667 = 1.  2 = 5. = 2.  = 1.5 = 1.5 Then: Treatment Measures Mean  i X 1 2 2 1.667-1. Y 5 6 5 5. 2. Z 2 1 1.5-1.5 Overall mean // STEP 2: The ANOVA table. Cause of the variation df SS MS F F Krit Treatment............... Residuals......... Total...... For the column df (degrees of freedom) just remember the rule minus one : We have different Treatments df treat = 1 = 2 We have 8 different measurements df tot = 8 1 = 7 df treat + df res = df tot df res = 7 2 = 5 For the column SS (sum of squares) we can proceed as follows: 2 Remark that the overall mean does not necessary coincide with the mean of the y i.! 2

SS treat = sum of squares between treatment groups =  2 i #measures = ( 1.) 2 + (2.) 2 + (1.5) 2 2 = 26.17 SS res = sum of squares within treatment groups = (y ij y i. ) 2 = SS rowi i j i = [ (1 1.667) 2 + (2 1.667) 2 + (2 1.667) 2] + [0.667] + [0.5] = 1.8 SS tot = Total sum of squares = i,j (y ij ˆµ) 2 = (1 ) 2 + (2 ) 2 +... + (1 ) 2 = 28 Remark 2 The total SS is always equal to the sum of the other SS! SS tot = SS treat + SS res 28 = 26.17 + 1.8 For the column MS (mean square) just remember the rule MS = SS/df, then: MS treat = SS treat = 26.17 = 1.08 df treat 2 MS res = SS res = 1.8 = 0.7 df res 5 The F-value is just given by: F = MS treat MS res = 1.08 0.7 = 5.68 The F value says us how far away we are from the hypothesis we can not distinguish between error and treatment, i.e. Treatment is not relevant according to our data! A big F value implies that the effect of the treatment is relevant! Remark A small F value does NOT imply that the hypothesis A i = 0 i is true. (We just can not conclude that it is false!)

STEP : The decision: Similar as for a T-test we calculate the critical value for the level α = 5% with degrees of freedom 2 and 5 (just read off the values from the appropriate table). α = 5% F krit 2,5 (5%) = 5.79 We have calculated F = 5.68 > F krit 2,5 (5%). Consequently we REJECT THE HYPOTHESIS A 1 = A 2 = A = 0!!! Similarly we could obtain the same result by calculating the p value p = 0.11% F 2,5 (p) = 5.68 0.11% is less than 5%. Consequently we reject the hypothesis A 1 = A 2 = A = 0!!! Calculations with R STEP 0: Insert the data v <- c(1,2,2,5,6,5,2,1) TR <- c(1,1,1,2,2,2,,) d <- data.frame(v,tr) d$tr <- as.factor(d$tr) All the measurements have to be in the same vector (v in this case). For every factor (in this case just TR) we construct a vector, which can be interpreted as follows: the first three Values of the vector v belong to treatment 1 (X), the two last components to treatment (Z) and the other to treatment 2 (Y). WE know that v and TR belong to the same set of data, WE have to tell this even the PC! Therefore: d <- data.frame(v,tr)! WE know that the factor TR in the data set d is a factor, the PC doesn t! Therefore: d$tr <- as.factor(d$tr)! check with str(d) that d$v is a vector of numbers (num) and d$tr is a factor (Factor) Because F is obtained by MS treat (2 deg of freedom) and MS res (5 deg of freedom), we calculate F krit 2,5 (5%). 4

> str(d) data.frame : 8 obs. of 2 variables: $ v : num 1 2 2 5 6 5 2 1 $ TR: Factor w/ levels "1","2","": 1 1 1 2 2 2 STEP 1: Do the ANOVA table d.fit <- aov(v~tr,data=d) summary(d.fit) Makes an ANOVA table of the data set d, analysing if the factor TR has a significant effect on v. The function summary shows the ANOVA table. > summary(d.fit) Df Sum Sq Mean Sq F value Pr(>F) TR 2 26.1667 1.08 5.682 0.001097 ** Residuals 5 1.8 0.667 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 STEP 2: Decision: Exactly the same as for the by hand calculated table With R we do not have the critical values to a level, but we have the P value (PR(>F)). PR(>F)=0.1097%, this means: if we choose a level a of 0.1%, we can not reject the Null-Hypothesis, by choosing a level α = 0.11% or bigger we have to reject H 0! (Usually we choose a = 5% H 0 will be rejected!) 5