12: Analysis of Variance. Introduction



Similar documents
One-Way Analysis of Variance (ANOVA) Example Problem

1.5 Oneway Analysis of Variance

One-Way Analysis of Variance

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Recall this chart that showed how most of our course would be organized:

An analysis method for a quantitative outcome and two categorical explanatory variables.

Regression step-by-step using Microsoft Excel

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Unit 31: One-Way ANOVA

Two Related Samples t Test

Chapter 5 Analysis of variance SPSS Analysis of variance

Final Exam Practice Problem Answers

Projects Involving Statistics (& SPSS)

13: Additional ANOVA Topics. Post hoc Comparisons

Chapter 7. One-way ANOVA

Simple Linear Regression Inference

Lecture 1: Review and Exploratory Data Analysis (EDA)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

2 Sample t-test (unequal sample sizes and unequal variances)

Independent t- Test (Comparing Two Means)

UNDERSTANDING THE TWO-WAY ANOVA

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Section 13, Part 1 ANOVA. Analysis Of Variance

Randomized Block Analysis of Variance

Statistics Review PSY379

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

2013 MBA Jump Start Program. Statistics Module Part 3

N-Way Analysis of Variance

Regression Analysis: A Complete Example

Multiple Linear Regression

Confidence Intervals for the Difference Between Two Means

Elementary Statistics Sample Exam #3

How To Test For Significance On A Data Set

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Study Guide for the Final Exam

Chapter 7 Section 1 Homework Set A

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 7 Section 7.1: Inference for the Mean of a Population

7. Comparing Means Using t-tests.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Chapter 2 Probability Topics SPSS T tests

3.4 Statistical inference for 2 populations based on two samples

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

How To Check For Differences In The One Way Anova

General Regression Formulae ) (N-2) (1 - r 2 YX

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Difference of Means and ANOVA Problems

Analysis of Variance. MINITAB User s Guide 2 3-1

Chapter 19 Split-Plot Designs

HYPOTHESIS TESTING WITH SPSS:

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Comparing Means in Two Populations

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

DATA INTERPRETATION AND STATISTICS

How To Run Statistical Tests in Excel

Estimation of σ 2, the variance of ɛ

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2. Simple Linear Regression

Fairfield Public Schools

Assessing Measurement System Variation

ABSORBENCY OF PAPER TOWELS

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Introduction. Statistics Toolbox

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Interaction between quantitative predictors

Descriptive Statistics

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

PSYC 381 Statistics Arlo Clark-Foos, Ph.D.

individualdifferences

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Premaster Statistics Tutorial 4 Full solutions

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Univariate Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Guide to Microsoft Excel for calculations, statistics, and plotting data

SPSS Guide: Regression Analysis

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

When to use Excel. When NOT to use Excel 9/24/2014

Hypothesis testing - Steps

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Normality Testing in Excel

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

The Analysis of Variance ANOVA

10. Analysis of Longitudinal Studies Repeat-measures analysis

Transcription:

1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider means from k independent groups, where k is or greater. The technique is called analysis of variance, or ANOVA for short. Illustrative data (anova.sav). e consider AGE of study participants from three different clinical centers. Ages (years) are: Center 1 (n 1 = ): 60, 66, 65, 55, 6, 70, 51, 7, 58, 61, 71, 41, 70, 57, 55, 63, 64, 76, 74, 54, 58, 73 Center (n = 18): 56, 65, 65, 63,,57, 47, 7, 56, 5, 75, 66, 6, 68, 75, 60, 73, 63, 64 Center 3 (n = 3): 67, 56, 65, 61, 63, 59, 4, 53, 63, 65, 60, 57, 6, 70, 73, 63, 55, 5, 58, 68, 70, 7, 45 1 Data are entered as two separate variables, one for the dependent variable and one for the group ( independent ) variable. Data are stored in anova.sav are stored as variables AGE and CENTER. The first five and last five observations are: OBS AGE CENTER 1 60 1 66 1 3 65 1 4 55 1 5 6 1 etc. etc. etc. 59 58 3 60 68 3 61 70 3 6 7 3 63 45 3 Page 1.1 (C:\data\StatPrimer\anova-a.wpd /18/07)

EDA Summary statistics. Let N denote the total sample size (e.g., N = 63). Let (no subscript) represent the mean of all N subjects, combined. This is called the grand mean. For the illustrative AGE data, = 6.17. Group means, standard deviations, and sample sizes are denoted with subscripts. For the illustrative data: = 6.546, s 1 = 8.673, n 1 = = 63.78, s = 7.789, n = 18 = 60.86, s 3 = 8.004, n 3 = 3 e note that group means and standard deviations are all within a couple of years of each other. Graphical Analysis. A side-by-side boxplot is one of the best way to compare group locations, spreads, and shapes. (Detailed instruction on how to draw and interpret boxplots was presented in Chapter 4). A side-by-side boxplot for the illustrative data, shown at the bottom of the page, shows distributions with similar shapes, locations and spreads. It may be worth noting that group 3 has a low outside value. SPSS. Click Analyze > Descriptive Statistics > Explore and place the dependent variable in the Dependent list and the group variable in the Factor list. Other graphical technique such as side-by-side dot plots, stem-and-leaf plots, mean±sd, mean±se, and confidence interval plots may also be employed to compare the groups. Figure 1 Page 1. (C:\data\StatPrimer\anova-a.wpd /18/07)

Null and Alternative Hypotheses Hypothesis Test (ANOVA) The name analysis of variance may mislead some students to think the technique is used to compare group variances. In fact, analysis of variance uses variance to cast inference on group means. The null and alternative hypotheses are: H 0: 1 = =... = k H : H is false ( at least one population mean differs ) 1 0 where i represents the population mean of group i. hether an observed difference between group mean is surprising will depends on the spread (variance) of the observations within groups. idely different averages can more likely arise by chance if individual observations within groups vary greatly. e must therefore take into account the variance within group when assessing differences between groups. Consider the dot-plots below. Surely the difference demonstrated between the first three groups is more likely to be significantly that the difference demonstrated by the second three groups. Thus, if the variance between groups exceeds what is expected in terms of the variance within groups, we will reject the null hypothesis. Let ² represent the variance within groups in the population. Let ² B denote the variance between groups within the population. The null and alternative hypotheses may now be restated as: 0 B 1 B H : H : > The variance between groups may be thought of as a signal of group differences. The variance within groups may be thought of as background noise. hen the signal exceeds the noise, we will reject the null hypothesis. Page 1.3 (C:\data\StatPrimer\anova-a.wpd /18/07)

Variance Between Groups Let s B represent the sample variance between groups: (1) This statistic, also called the Means Square Between (MSB), is a measure of the variability of group means around the grand mean (Fig. ). The SS B (sum of squares between) is () where n i represents the size of group i, represents the mean of group i, and represents the grand mean. For the illustrative data, SS B = [()(6.5 6.17) + (18)(63.3 6.17) + (3)(60.8 6.17) ] = 66.6. This statistic has degrees of freedom: (3) where k represents the number of groups. For the illustrative data, df B = 3 1. For the illustrative data, s B = 66.614 / = 33.3. Figure 3. Variability between. Page 1.4 (C:\data\StatPrimer\anova-a.wpd /18/07)

Variance ithin Groups The variance within (s ) quantifies the spread of values within groups (Fig. 3). In the jargon of ANOVA, the variance within is also called the Mean Square ithin (MS) and is calculated: (4) where the sum of squares within (SS ) is: (5) and the degrees of freedom within is: (6) For the illustrative data, SS = [( 1)(8.673 ) + (18 1)(7.789 ) + (3 1)(8.004 )] = 400.36 and df w = 63 3 = 60. Thus, s = 400.36 / 60 = 67.006. w An alternative formula for the variance within is: (7) where s² i represent the variance in group i and df i represent the degrees in the group (df i = ni 1). This alternative formula shows the variance within as a weighted average of group variances with weights determined by group degrees of freedom. Figure 4. Variability within groups Page 1.5 (C:\data\StatPrimer\anova-a.wpd /18/07)

ANOVA Table and F Statistic The statistics describe thus far are arranged to form an ANOVA table as follows: Source Sum of Squares Degrees of freedom Mean Squares Between SSB df B = k 1 ithin SS df w = N k = SS / df B = SS / df B Total SS T = SS B + SS df = df B + df w The ANOVA table for the illustrative data is: Source Sum of Squares Degrees of freedom Mean Squares Between 66.614 33.307 ithin 400.370 60 67.006 Total 4086.984 6 B The ratio of the variance between (s ) and the variance within (s ) is the ANOVA F statistic: (8) Under the null hypothesis, this test statistic has an F sampling distribution with df 1 and df degrees of freedom. The p value for the test is represented as the area under F df1,df to the right tail of the F stat. For the illustrative example, Fstat = 33.307 / 67.006 = 0.50 with and 60 degrees of freedom. Using the F table we note F,60,.95 = 3.15. Therefore, p >.05 (Fig. 4). The more precise p value can be computed with inpepi > hatis.exe and other probability calculator (p =.60). SPSS: Click Analyze > Compare Means > One-way ANOVA. Then select the dependent variable and independent variable, and click OK. Figure 5. Page 1.6 (C:\data\StatPrimer\anova-a.wpd /18/07)

OPTIONAL: HEN k =, ANOVA = EQUAL VARIANCE T TEST ANOVA is an extension of the equal variance independent t test. hereas the independent t tests compares the two groups (k = ) by testing H 0: 1 =, ANOVA tests k groups, where k represents any integer greater than 1. Moreover, 1. The variance within groups calculated by ANOVA is equal to the pooled estimate of variance used in the independent t test (s = s ) w. df B in the ANOVA in testing two groups = 1 = 1 p 3. df w in the ANOVA = N = n 1+n = the degrees of freedom in the independent t test 4. the F from the ANOVA = the (t )² from the independent t test stat stat 5. when =.05, F 1, N-,.95 = t N-,.975 OPTIONAL: SAMPLE SIZE REQUIREMENTS FOR ANOVA Sample size requirements for an ANOVA can be determined by asking how big a sample is needed to detect a difference of at a type I error rate of with power 1. It is also necessary to assume a value of variance within groups ( ). Computational solutions are available in Sokal & Rohlf (1996, pp. 63-64). Calculations have been scripted on the website http://department.obg.cuhk.edu.hk/ and can be accessed by clicking Statistical Tools > Statistical Tests > Sample Size > Comparing Means. Illustrative example. Suppose we test H 0: 1 = = 3. Prior studies estimate the measurement has standard deviation (within groups) of 8. To find a mean difference of 5, the above website derives the following results: Sample size Per Group: Type I error=0.05 Type I error=0.01 Type I error=0.001 Power=80% 41 60 87 Power=90% 54 76 107 Power=95% 67 91 15 i The output provides samples sizes per group (n ) at various power and levels. For example, under the stated assumptions, we need n = 54 (per group) for 90% power at =.05. Page 1.7 (C:\data\StatPrimer\anova-a.wpd /18/07)