One-Way Analysis of Variance (ANOVA) with Tukey s HSD Post-Hoc Test

Similar documents
Section 13, Part 1 ANOVA. Analysis Of Variance

Multiple-Comparison Procedures

SPSS Explore procedure

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Linear Models in STATA and ANOVA

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Descriptive Statistics

Recall this chart that showed how most of our course would be organized:

Analysis of Variance ANOVA

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

1.5 Oneway Analysis of Variance

Study Guide for the Final Exam

13: Additional ANOVA Topics. Post hoc Comparisons

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

One-Way Analysis of Variance (ANOVA) Example Problem

NCSS Statistical Software

StatCrunch and Nonparametric Statistics

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Projects Involving Statistics (& SPSS)

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Chapter 5 Analysis of variance SPSS Analysis of variance

Using Microsoft Excel to Analyze Data

How To Check For Differences In The One Way Anova

The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

CHAPTER 14 NONPARAMETRIC TESTS

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Tutorial 5: Hypothesis Testing

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

ABSORBENCY OF PAPER TOWELS

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

HYPOTHESIS TESTING WITH SPSS:

individualdifferences

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

UNDERSTANDING THE TWO-WAY ANOVA

Comparing Means in Two Populations

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Chapter 7. One-way ANOVA

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

1 Basic ANOVA concepts

The Dummy s Guide to Data Analysis Using SPSS

Sample Size and Power in Clinical Trials

Independent samples t-test. Dr. Tom Pierce Radford University

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Reporting Statistics in Psychology

Statistical tests for SPSS

Tutorial for proteome data analysis using the Perseus software platform

Two-sample hypothesis testing, II /16/2004

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

II. DISTRIBUTIONS distribution normal distribution. standard scores

Experimental Designs (revisited)

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

NCSS Statistical Software

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Power Analysis: Intermediate Course in the UCLA Statistical Consulting Series on Power

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

An introduction to IBM SPSS Statistics

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

MEASURES OF LOCATION AND SPREAD

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Skewed Data and Non-parametric Methods

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

SIMPLIFYING ALGEBRAIC FRACTIONS

CALCULATIONS & STATISTICS

SPSS/Excel Workshop 3 Summer Semester, 2010

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

The Wilcoxon Rank-Sum Test

Introduction to Quantitative Methods

Come scegliere un test statistico

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

TIPS FOR DOING STATISTICS IN EXCEL

Multiple Linear Regression

SPSS Tests for Versions 9 to 13

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Simple Tricks for Using SPSS for Windows

Part 2: Analysis of Relationship Between Two Variables

Mixed 2 x 3 ANOVA. Notes

Non-Inferiority Tests for One Mean

Module 4 (Effect of Alcohol on Worms): Data Analysis

One-Way Analysis of Variance

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Biostatistics: Types of Data Analysis

1 Nonparametric Statistics

Rank-Based Non-Parametric Tests

Transcription:

One-Way Analysis of Variance (ANOVA) with Tukey s HSD Post-Hoc Test Prepared by Allison Horst for the Bren School of Environmental Science & Management Introduction When you are comparing two samples to determine whether they are significantly different, we can use parametric (t-tests) or non-parametric tests (Wilcoxon Signed Rank, Mann Whitney U). What if we are trying to determine if significant differences exist when there are more than two groups (populations, samples, treatments, etc.) of interest? If we are trying to compare means for more than two groups, where it can be assumed that data satisfy parametric assumptions, an appropriate method to determine whether any significant differences exist between groups is called Analysis of Variance, or ANOVA. ANOVA is useful for comparing means across groups when there are more than 2 treatments being compared. For ANOVA, the null and alternative hypotheses are as follows: H 0 : means across groups do not differ H 1 : means differ between at least two groups If we re just trying to compare means between groups again, then why not just use multiple two-sample t-tests? Remember that with t-tests, we have about a 5% chance (if we use a 95% confidence level) of making a Type I Error. If we perform multiple two-sample t-tests, our opportunity for a Type I Error increases with each additional test...if you perform three independent t-tests, your total probability of committing a Type I Error is already 15% - which is already relatively high! Performing a single ANOVA test on all data groups simultaneously reduces the potential for Type I Error. ANOVA, Conceptually When using a t-test, we essentially try to determine whether the difference between the means is equal to zero (the null hypothesis) or not equal to zero (the alternative hypothesis). How do we expand this to include multiple groups that are influenced by one predictor variable (one-way ANOVA)? While we are comparing means, we actually use an analysis of sample variability to determine whether differences between samples are significance hence Analysis of Variance instead of Analysis of Means. In essence, ANOVA is comparing the variability within each group is large compared to the variability between the means of the groups. The basic question that ANOVA answers is: Do the sample means show differences from each other that are large relative to the differences among individual cases within each sample? In other words, how does the variability in sample means compare to the variability within each sample?

If the variability in mean differences is sufficiently greater than the variability within groups where sufficiently depends on the significance level selected by the user then the result will be to reject the null hypothesis. If the variability between group means is not sufficiently greater than the variability within groups, then the null hypothesis is retained. The measure of variability in ANOVA is via the F-statistic, which we saw earlier for determining if variances are equal, which can be converted into a p-value from the f-distribution. Performing one-way ANOVA with > 2 samples does NOT provide information about significances between any two samples. It can only help you to conclude whether there are not significant differences across all samples, or whether you think there are at least two significantly different treatments. To determine which two samples may be significantly different, you can follow ANOVA with a post-hoc test. When a significant result arises during ANOVA (i.e., you reject the null hypothesis), you can perform a posthoc (translates to after this ) test to determine which groups differ. There are several options for post-hoc statistical tests, including the Bonferroni approach, stepdown procedures, and Dunnet s and Hsu s procedures (found easily online or in basic statistics texts). One post-hoc test that works particularly well following ANOVA, is widely accepted in statistical literature, is easily performed in R, and is somewhat conservative, is the Tukey s Honestly Significant Difference (Tukey s HSD) test. **Note: If you are interested in the mathematics behind either one-way ANOVA or Tukey s HSD and would like to see an example done by-hand, contact Allison at ahorst@bren.ucsb.edu. One-Way ANOVA with Post-Hoc Tukey s HSD in RStudio Follow along with the example in this section to learn how to perform one-way ANOVA with post-hoc Tukey s HSD with a dataset describing the effects of four different enzymes on a certain reaction rate. Step 1. Organize your data When working in RStudio, the easiest way to organize your data for one-way ANOVA is as follows (note that this only shows through Enzyme B the actual dataset (see.xlsx file titled Enzymatic Reaction Rates accompanying this file on the website to follow along in RStudio) includes data for Enzymes A - D:!"#$%& '&()*+,"-'(*&-.%,/&010&),"23 4 5678 4 9879 4 9675 4 9978 4 9:7; 4 9<75 4 5678 4 5=78 4 5>79? 9<78? 997@? 957:? 967>? :@75? 9;7;? 957>? 9@76? 997; A 9>7>

**REMEMBER: Simplify all datasets and column names before saving as a.csv file and loading into RStudio. For the example here, the column names were simplified to Enzyme and Rate, respectively, and the data were saved as a.csv file titled Enzymes.csv. You are encouraged to do the same to follow along with the example presented here. There are several options for performing one-way ANOVA in RStudio. Here, I will use the aov() function. Explore the function (?aov). The function works as follows: > TESTNAME <- aov(valuescolumnname ~ SubsetColumn, data = DatasetName) For the enzymatic reaction rate data, the command would look like this (if the data is loaded using the dataset name Enzymes : > EnzymeANOVA <- aov(rate ~ Enzyme, data = Enzymes) Explore the results using the summary() function. > summary(enzymeanova) The p-value for the ANOVA summary helps you decide if you will reject or retain the null hypothesis (see hypotheses above). If you decide to reject the null hypothesis (thereby retaining the null that there is a significant difference between at least two of the treatments in your ANOVA), then you will likely want to examine further which treatments may be different. That s when post-hoc tests come in. One common (and robust) test that is already programmed in R to work with the aov() function is Tukey s HSD. Tukey s HSD is performed using the TukeyHSD() function in RStudio as follows: > PostHocTestName <- TukeyHSD(ANOVATestName) **Note: You must have already performed one-way ANOVA using the aov() function and assigned that test to a variable name. That variable name is what is entered into the TukeyHSD() function to run the post-hoc test. Perform Tukey s HSD with the EnzymeANOVA results: > PostHoc <- TukeyHSD(EnzymeANOVA)

Review the results from the post-hoc test. What do they tell you? > PostHoc Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Rate ~ Enzyme, data = Enzymes) $Enzyme diff lwr upr p adj B-A 2.077778-3.267837 7.423392 0.7198683 C-A 10.244444 4.898830 15.590059 0.0000646 D-A 17.677778 12.332163 23.023392 0.0000000 C-B 8.166667 2.821052 13.512281 0.0012904 D-B 15.600000 10.254385 20.945615 0.0000000 D-C 7.433333 2.087719 12.778948 0.0035636 Notice that RStudio very nicely displays the results of the post-hoc test. The ONLY TWO treatments that are NOT significantly different following one-way ANOVA with post-hoc Tukey s analysis are Enzymes B and A. Communicating Results of One-Way ANOVA Reporting ANOVA results in text is somewhat involved, as you may want to include: - Whether the ANOVA is one-way or multi-way - What the independent and dependent variables are, and the conditions studied - The outcome of the hypothesis test - The significance level (α) - The between-groups (numerator) and within groups (denominator) degrees of freedom (k 1, and N k, respectively) - The calculated F-value - The corresponding p-value For example: One-way ANOVA was performed to compare the influence of species (that s the independent variable) on metabolic rate (there s the dependent variable) for dogs, mantis shrimp, and great white sharks (those are the conditions ). There was a significant effect of species on metabolic rate (that s the result of the hypothesis test) at a significance level of α = 0.05 (that s the significance level) for the three species [F(2,21)=16.02; p = 0.00006] (that s where you report the F-value, the degrees of freedom, and the p-value). Removing the notes in there, it would read as follows: One-way ANOVA was performed to compare the influence of species on metabolic rate for dogs, mantis shrimp, and great white sharks. There was a significant effect of species on metabolic rate at a significance level of α = 0.05 for the three species [F(2,21) = 16.02; p = 0.00006]. Another, shorter approach (though slightly less informative) for a different dataset is:

There was no significant difference in bacterial growth rate for cultures exposed to Low, Medium or High uranium concentrations (α = 0.05; one-way ANOVA, F(2,21) = 0.94; p = 0.34. Depending on how many samples you compare, you may also want to show the results visually using a chart with asterisks or like-letters indicating significance. Often you will see like letters or symbols indicating values that are not significantly different. From the Tukey s HSD test, determine which means are statistically the same. Add like letters to indicate values that do not differ significantly (usually above the column error bars) (in Excel, select series > right click > Add Data Labels > edit data labels to contain correct letters or symbols to indicate significance). For example:!"#$%&'(!#)"(*+&,"-.-/( &%" *&" *%" )&" )%" (&" (%" '&" '%" &" %" 03"$)(&4(0'12+"(526"(&'(!"#$%&'(!#)"( $" #"!"!" +"," -"." 0'12+"( Figure 1. Effect of enzyme type on reaction rate. Reaction rates (moles/s) for the conversion of Compound X to Compound Y for Enzymes A, B, C, and D. Like letters above error bars indicate values that are not significantly different by one-way ANOVA (F(3,32) = 33.7; p < 0.001, α = 0.05) with post-hoc Tukey s HSD (α = 0.05). Error bars indicate ± 1 standard deviation. The results can also be summarized in a table, using superscripts indicating means that do not differ significantly. For example: Table 1. Enzyme effect on reaction rate. Reaction rates (moles/s) for the conversion of X to Y in the presence of enzyme A, B, C, or D (M ± SD). Like letters indicate values that are not significantly different by one-way ANOVA (F(3,32) = 33.7; p < 0.001, α = 0.05) with post-hoc Tukey s HSD (α = 0.05).!"#$%&' (&)*+,-"'()+&'.%-/&01234! ""#$%&'&(#)) * + ",#-(&'&(#() *. ("#($&'&,#$( / 0 (1#)(&'&2#(- 3