One-Way Analysis of Variance (ANOVA) Example Problem



Similar documents
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

One-Way Analysis of Variance

1.5 Oneway Analysis of Variance

12: Analysis of Variance. Introduction

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Regression step-by-step using Microsoft Excel

Using Microsoft Excel to Analyze Data

Section 13, Part 1 ANOVA. Analysis Of Variance

CHAPTER 13. Experimental Design and Analysis of Variance

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Elementary Statistics Sample Exam #3

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Data Analysis Tools. Tools for Summarizing Data

Regression Analysis: A Complete Example

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

Chapter 5 Analysis of variance SPSS Analysis of variance

Recall this chart that showed how most of our course would be organized:

Difference of Means and ANOVA Problems

Randomized Block Analysis of Variance

The Analysis of Variance ANOVA

Statistics Review PSY379

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chapter 7. One-way ANOVA

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Using Excel in Research. Hui Bian Office for Faculty Excellence

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Guide to Microsoft Excel for calculations, statistics, and plotting data

An analysis method for a quantitative outcome and two categorical explanatory variables.

Analysis of Variance. MINITAB User s Guide 2 3-1

8 6 X 2 Test for a Variance or Standard Deviation

ANOVA. February 12, 2015

Final Exam Practice Problem Answers

Two-sample hypothesis testing, II /16/2004

individualdifferences

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Solutions to Homework 10 Statistics 302 Professor Larget

Using Excel for inferential statistics

When to use Excel. When NOT to use Excel 9/24/2014

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Analysis of Variance ANOVA

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Statistical Functions in Excel

1.1. Simple Regression in Excel (Excel 2010).

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Statistiek II. John Nerbonne. October 1, Dept of Information Science

NCSS Statistical Software

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

CS 147: Computer Systems Performance Analysis

ABSORBENCY OF PAPER TOWELS

A Basic Guide to Analyzing Individual Scores Data with SPSS

Study Guide for the Final Exam

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

Multiple Linear Regression

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

THE KRUSKAL WALLLIS TEST

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Unit 31: One-Way ANOVA

2 Sample t-test (unequal sample sizes and unequal variances)

NCSS Statistical Software

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

HYPOTHESIS TESTING WITH SPSS:

UNDERSTANDING THE TWO-WAY ANOVA

Simple Linear Regression Inference

13: Additional ANOVA Topics. Post hoc Comparisons

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Session 10. Laboratory Works

Chapter 2 Simple Comparative Experiments Solutions

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Using Excel s Analysis ToolPak Add-In

General Regression Formulae ) (N-2) (1 - r 2 YX

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Two Related Samples t Test

An introduction to using Microsoft Excel for quantitative data analysis

PSYC 381 Statistics Arlo Clark-Foos, Ph.D.

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

The ANOVA for 2x2 Independent Groups Factorial Design

Week TSX Index

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

GLM I An Introduction to Generalized Linear Models

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Business Valuation Review

A Quick Guide to Using Excel 2007 s Regression Analysis Tool

CHAPTER 14 NONPARAMETRIC TESTS

Simple Predictive Analytics Curtis Seare

How To Run Statistical Tests in Excel

Math 108 Exam 3 Solutions Spring 00

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Transcription:

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means by examining the variances of samples that are taken. ANOVA allows one to determine whether the differences between the samples are simply due to random error (sampling errors) or whether there are systematic treatment effects that causes the mean in one group to differ from the mean in another. Most of the time ANOVA is used to compare the equality of three or more means, however when the means from two samples are compared using ANOVA it is equivalent to using a t-test to compare the means of independent samples. ANOVA is based on comparing the variance (or variation) between the data samples to variation within each particular sample. If the between variation is much larger than the within variation, the means of different samples will not be equal. If the between and within variations are approximately the same size, then there will be no significant difference between sample means. Assumptions of ANOVA: (i) All populations involved follow a normal distribution. (ii) All populations have the same variance (or standard deviation). (iii) The samples are randomly selected and independent of one another. Since ANOVA assumes the populations involved follow a normal distribution, ANOVA falls into a category of hypothesis tests known as parametric tests. If the populations involved did not follow a normal distribution, an ANOVA test could not be used to examine the equality of the sample means. Instead, one would have to use a non-parametric test (or distribution-free test), which is a more general form of hypothesis testing that does not rely on distributional assumptions. Example Consider this example: Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of the treatments (cars types). Using the hypothetical data provided below, test whether the mean pressure applied to the driver s head during a crash test is equal for each types of car. Use α = 5%. Table ANOVA.1 Compact cars Midsize cars Full-size cars 643 469 484 655 47 456 70 55 40 X 666.67 473.67 447.33 S 31.18 49.17 41.68

(1.) State the null and alternative hypotheses The null hypothesis for an ANOVA always assumes the population means are equal. Hence, we may write the null hypothesis as: H 0 : µ 1 = µ = µ 3 - The mean head pressure is statistically equal across the three types of cars. Since the null hypothesis assumes all the means are equal, we could reect the null hypothesis if only mean is not equal. Thus, the alternative hypothesis is: H a : At least one mean pressure is not statistically equal. (.) Calculate the appropriate test statistic The test statistic in ANOVA is the ratio of the between and within variation in the data. It follows an F distribution. Total Sum of Squares the total variation in the data. It is the sum of the between and within variation. r c Total Sum of Squares (SST) = ( X i X ), where r is the number of rows in the table, c is i= 1 = 1 the number of columns, X is the grand mean, and X i is the i th observation in the th column. Using the data in Table ANOVA.1 we may find the grand mean: X i (643 + 655 + 70 + 469 + 47 + 55 + 484 + 456 + 40) X = = N 9 = 59. SST = (643 59.) + (655 59.) + (70 59.) + (469 59.) +... + (40 59.) = 96303.55 Between Sum of Squares (or Treatment Sum of Squares) variation in the data between the different samples (or treatments). Treatment Sum of Squares (SSTR) = r ( X X ), where th treatment and X is the mean of the th treatment. r is the number of rows in the Using the data in Table ANOVA.1, SSTR = [3 (666.67 59.) ] + [3 (473.67 59.) ] + [3 (447.33 59.) ] = 86049. 55

Within variation (or Error Sum of Squares) variation in the data from each individual treatment. Error Sum of Squares (SSE) = ( X i X From Table ANOVA.1, SSE= [(643 666.67) + (655 666.67) + (70 666.67) ] + [(469 473.67) + (47 473.67) + (55 473.67) ] + [(484 447.33) + (456 447.33) + (40 447.33) ] = 1054. Note that SST = SSTR + SSE (96303.55 = 86049.55 + 1054). ) Hence, you only need to compute any two of three sources of variation to conduct an ANOVA. Especially for the first few problems you work out, you should calculate all three for practice. The next step in an ANOVA is to compute the average sources of variation in the data using SST, SSTR, and SSE. Total Mean Squares (MST) number of observations) SST = average total variation in the data (N is the total N 1 MST = 96303.55 = 1037.94 (9 1) Mean Square Treatment (MSTR) = columns in the data table) SSTR average between variation (c is the number of c 1 MSTR = 86049.55 = 4304.78 (3 1) Mean Square Error (MSE) = SSE N c average within variation MSE = 1054 = 1709 (9 3) Note: MST MSTR + MSE

The test statistic may now be calculated. For a one-way ANOVA the test statistic is equal to the ratio of MSTR and MSE. This is the ratio of the average between variation to the average within variation. In addition, this ratio is known to follow an F distribution. Hence, MSTR MSE 4304.78 = = 1709 5.17 F =. The intuition here is relatively straightforward. If the average between variation rises relative to the average within variation, the F statistic will rise and so will our chance of reecting the null hypothesis. (3.) Obtain the Critical Value To find the critical value from an F distribution you must know the numerator (MSTR) and denominator (MSE) degrees of freedom, along with the significance level. F CV has df1 and df degrees of freedom, where df1 is the numerator degrees of freedom equal to c-1 and df is the denominator degrees of freedom equal to N-c. In our example, df1 = 3-1 = and df = 9-3 = 6. Hence we need to find α = 5%. Using the F tables in your text we determine that CV F, 6 = 5.14. CV F, 6 corresponding to (4.) Decision Rule You reect the null hypothesis if: F (observed value) > F CV (critical value). In our example 5.17 > 5.14, so we reect the null hypothesis. (5.) Interpretation Since we reected the null hypothesis, we are 95% confident (1-α ) that the mean head pressure is not statistically equal for compact, midsize, and full size cars. However, since only one mean must be different to reect the null, we do not yet know which mean(s) is/are different. In short, an ANOVA test will test us that at least one mean is different, but an additional test must be conducted to determine which mean(s) is/are different. Determining Which Mean(s) Is/Are Different If you fail to reect the null hypothesis in an ANOVA then you are done. You know, with some level of confidence, that the treatment means are statistically equal. However, if you reect the null then you must conduct a separate test to determine which mean(s) is/are different. There are several techniques for testing the differences between means, but the most common test is the Least Significant Difference Test.

MSE F1, Least Significant Difference (LSD) for a balanced sample: r the mean square error and r is the number of rows in each treatment. N c, where MSE is ()(1709)(5.99) In the example above, LSD = = 8.61 3 Thus, if the absolute value of the difference between any two treatment means is greater than 8.61, we may conclude that they are not statistically equal. Compact cars vs. Midsize cars: 666.67 473.67 = 193. Since 193 > 8.61 mean head pressure is statistically different between compact and midsize cars. Midsize cars vs. Full-size cars: 473.67 447.33 = 6.34. Since 6.34 < 8.61 mean head pressure is statistically equal between midsize and full-size cars. Compact vs. Full-size: Work this on your own. One-way ANOVA in Excel You may conduct a one-way ANOVA using Excel. (Preliminary step) First, make sure that the Analysis ToolPak is installed. Under Tools is the option Data Analysis present? If yes ToolPak is installed. If no select Add-ins. Check the boxes entitled Analysis ToolPak and Analysis ToolPak VBA and click OK. This will install the Data Analysis ToolPak. (1.) Under Tools select Data Analysis In the window that appears select ANOVA: One factor and click OK. (.) Using your mouse highlight the cells containing the data. (3.) Select Columns if each treatment is its own column or Row if each treatment is its own row. (4.) Set your level of significance. (The default is 5% or 0.05.) (5.) Click OK and the ANOVA output will appear on a new worksheet. ANOVA Results from Excel: SUMMARY Groups Count Sum Average Variance Column 1 3 000 666.6667 97.3333

Column 3 141 473.6667 417.333 Column 3 3 134 447.3333 1737.333 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 86049.55556 4304.78 5.17541 0.00107 5.14349 Within Groups 1054 6 1709 Total 96303.55556 8 The results under the heading SUMMARY simply provides you with summary statistics for each of your samples. The results of the ANOVA test are provided under the heading ANOVA. Comparing these figures with the example above, it should be simple to determine the meaning of the Excel output.