individualdifferences



Similar documents
1.5 Oneway Analysis of Variance

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Study Guide for the Final Exam

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

One-Way Analysis of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance

Randomized Block Analysis of Variance

UNDERSTANDING THE TWO-WAY ANOVA

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Measures of Central Tendency and Variability: Summarizing your Data for Others

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Additional sources Compilation of sources:

One-Way Analysis of Variance (ANOVA) Example Problem

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

General Regression Formulae ) (N-2) (1 - r 2 YX

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Chapter 5 Analysis of variance SPSS Analysis of variance

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Independent t- Test (Comparing Two Means)

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Non-Inferiority Tests for Two Means using Differences

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Comparing Means in Two Populations

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Recall this chart that showed how most of our course would be organized:

Permutation Tests for Comparing Two Populations

Chapter 7. One-way ANOVA

Descriptive Statistics

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Standard Deviation Estimator

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

HYPOTHESIS TESTING: POWER OF THE TEST

NCSS Statistical Software

Statistiek II. John Nerbonne. October 1, Dept of Information Science

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

Analysis of Variance ANOVA

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Introduction to Fixed Effects Methods

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Research Methods & Experimental Design

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

CHAPTER 13. Experimental Design and Analysis of Variance

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

THE KRUSKAL WALLLIS TEST

12: Analysis of Variance. Introduction

5. Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Multivariate Analysis of Variance (MANOVA)

Robust t Tests. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Introduction to General and Generalized Linear Models

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

The Analysis of Variance ANOVA

Lecture Notes Module 1

Association Between Variables

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Simple Linear Regression Inference

2 Sample t-test (unequal sample sizes and unequal variances)

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Non-Inferiority Tests for Two Proportions

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Multivariate Analysis of Variance (MANOVA)

Probability Calculator

15. Analysis of Variance

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

CS 147: Computer Systems Performance Analysis

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Rockefeller College University at Albany

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Crosstabulation & Chi Square

Concepts of Experimental Design

1 Basic ANOVA concepts

PSYCHOLOGY 320L Problem Set #3: One-Way ANOVA and Analytical Comparisons

Normality Testing in Excel

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Final Exam Practice Problem Answers

MULTIPLE REGRESSION WITH CATEGORICAL DATA

CHAPTER 14 NONPARAMETRIC TESTS

Module 5: Multiple Regression Analysis

CALCULATIONS & STATISTICS

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Confidence Intervals for the Difference Between Two Means

13: Additional ANOVA Topics. Post hoc Comparisons

Experimental Designs (revisited)

Transcription:

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general, ANOVA procedures are generalizations of the t-test and it can be shown that, if one is only interested in the difference between two groups on one independent categorical (i.e. grouping variable), that the independent samples t-test is a special case of ANOVA. A one-way ANOVA refers to having only one independent grouping variable or factor, which is the independent variable. It is possible to have more than one grouping variable, but we will start with the simplest case. If one only has two levels of the grouping variable then one can simply conduct an independent samples t-test, but if one has more than two levels of the grouping variable than one needs to conduct an ANOVA. Since we have more than two groups in ANOVA we need to figure out a way to describe the difference between all the means. One way to do this is to figure out the variance between the sample means because a large variance implies that the sample means differ a lot, whereas a small variance implies that the sample means are not that different. This will give us a single numeric value for the difference between all the sample means. The statistic used in ANOVA partitions the variance into two components: (1) the between treatment 1 variability and () the treatment variability. Whenever from different samples are compared there are three sources that can cause differences to be observed between the sample means: 1. Difference due to Treatment. Individual Differences 3. Differences due to Experimental Error These are the three different sources of variability that can be cause one to observe differences between treatment groups and so these sources of variability are referred to as the between treatment variability. Only two of these sources of variability can be observed a treatment group, specifically individual differences and experimental error, and these are referred to as the treatment variability. The statistics used in ANOVA, the F statistic, uses a ratio of between treatment variability and treatment variability to test whether or not there is a difference among treatments. Specifically: F between tr eatment variabilty treatment var iablity treatment effect + individualdifferences + experiment alerror individualdifferences + experiment alerror 1 Note that groups do not always represent treatments. Oftentimes ANOVA is used to determine differences in intact groups such as those that differ by ethnicity or gender. It should be noted that your book, and many statistical software packages refer to the treatment variability as the error variability.

If the treatment effect is small than the ratio will be close to one. Therefore, an F-statistic close to one would be expected if the null hypothesis were true and there were no treatment differences. If the treatment effect is large then the ratio will be much greater than one because the between treatment variability will be much larger than the treatment variability. The hypotheses tested in ANOVA are: H 0 : µ 1 µ µ 3... µ K H 1 : at least one mean is different from the rest where K the total number of groups or sample means being compared In the population, group has mean µ and variance σ. In the sample, group has mean X and variance s. The sample size for each group is n and the total number of observations, N n 1 + n + n 3 + + n K. The grand mean, of all observations is X. The assumptions underlying the test are the same as the assumption underlying the t-test for independent samples. Specifically, 1. Each group, in the population is normally distributed with mean µ. The variance in each group is the same so that σ1 σ K σ K σ, otherwise known as the homogeneity of variance assumption. 3. Each observation is independent of each other The computations underlying a simple one-way ANOVA are pretty straightforward if you remember that a variance is composed of two parts: (1) the sum of squared deviations from the mean () and () the degrees of freedom (df), which can be though of as the number of potentially different values that are used to compute the minus 1. Therefore the total variance, across all groups, is computed using total ( X X ) and df total N 1. We partition this variance into two parts, the treatment variance and the between treatment variance. Note that the total variance is simply the sum of treatment variance and between treatment variance and the df for the total variance is simply the sum of the df associated with the treatment variance and between treatment variance. The -treatment or -group variance is computed using error X X ), which represents the sum the squared deviations from each group mean and ( df df error (n 1 1) + (n 1) + (n 3 1)) + + (n K 1) (total number of observations) (number of groups) N K. The ratio of and df is known as the Mean Square groups ( ) or Mean Square Error ( error ) The between-treatment variability is computed using the between treatment n ( X X ), which represents the sum of the squared deviations of all group means from the grand (overall) mean and df between df treatment K 1, or the number of groups minus

3 one.. The ratio of between and df between is known as the Mean Square between groups ( between ) or Mean Square Treatment ( treatment ) The F-statistic is calculated by computing the ratio of Mean Square between groups ( between or treatment ) and Mean Square groups ( or error ). Specifically, F between This ratio follows a sampling distribution known as the F distribution which is a family of distributions based on the df of the numerator and the df of the denominator. Example A psychologist is interested in determining the extent to which physical attractiveness may influence a person s udgment of other personal characteristics, such as intelligence or ability. So he selects three groups of subects and asks them to pretend to be a company personnel manager and he gives them all a stack of identical ob applications which include picture of the applicants. One group of subects is given only pictures of very attractive people, another group is given only pictures of average looking people and a third group is given only pictures of unattractive people. Subects are asked to rate the quality of each applicant on a scale of 0 (which represents very poor qualities) to 10 (which represents excellent qualities). The following data is obtained: Attractive Average Unattractive 5 4 4 6 5 3 4 3 1 3 5 6 6 6 7 3 1 4 3 8 5 4 6 4 3 3 5 8 7 8 1 What should he conclude? Well, we first need to calculate the grand mean and the means for each of the three groups: 5 + 4 + 4 + 6 + 5 + 3 + 4 +... + 1 X 4.3 34 X X X 5 + 4 + 4 + 3 +... + 5 11 1 6 + 5 + 3 + 6 +... + 8 1 4 + 3 + 1+... + 1 11 3.36 4.55 5.9

4 ( X X Now we can calculate 3 ) N K (5 4.55) + (4 4.55) +... + (6 5.9) + (5 5.9) 34 3 and between 11(4.55 4.3) n ( X K 1 X ) + 1(5.9 4.3) +... + 11(.36 4.3) +... + (4.36) +... + (1.36).58 + 30.7 + 4.6 36.63 1.94 So the F-statistic 36.63/1.94 18.88, but how likely is it to have obtained this value if the null hypothesis is true? With and 31 df the critical F, at α.05, is approximately, 3.3. So the psychologist can reect the null hypothesis and conclude that person s udgment of the ob qualifications of prospective applicants appears to be influenced by how attractive the prospective applicant is. The ANOVA procedure is robust to violations of the assumptions, especially the assumption of normality. Violating the assumption of homogeneity of variance is especially problematic if the groups consist of different sample sizes. Levene s test, which we talked about before in terms of the t-test, can be used to test if the homogeneity of variance assumption has been violated. If it has, then the Welch procedure can be used to adust the df used in ANOVA, similar to what we talked about for the t-test. If the normality assumption is violated then the data can be transformed (because this won t change the results of the statistical test it will ust re-scale things) to be more normally distributed. Common transformation include: 1. Taking the square root of each observation is beneficial if the data is very skewed.. Taking the log of each observation is beneficial if the data is very positively skewed. 3. Taking the reciprocal of each observation (i.e. 1/observation) is beneficial if there are very large values in the positive tail of the distribution. Another approach to dealing with a violation of the normality assumption is to use a trimmed sample which removes a fixed percentage of the extreme values in each of the tails of the distribution or a Windsorized sample which replaces the values that are trimmed with the most extreme observations in the tail that are left. In the latter case the df need to be adusted by the number of values that are replaced. As we explore more complicated ANOVA models (models with more than one grouping variable) it will become important to be able to differentiate between fixed factors (or groups) and random factors. 3 Note: Answers obtained by hand, from Excel, or from a statistical software package will all most likely vary slightly due to rounding error.

5 A fixed factor is one in which the researcher is only interested in the various levels of the different groups that are being studied. These levels are not assumed to be representative of, nor generalizable to, other levels of the group. A random factor is one in which the researcher considers the various levels of the grouping variable to be a random sample from all possible levels. In this situation the results of the statistical test may be generalized to other levels of the group. It should be noted that there is a direct relationship between the t-test for independent samples and the ANOVA, when K. Specifically, it can be shown mathematically that the F-statistic the t-statistic, squared (i.e. F t ) Power and Effect Size Similar to the t-test, finding statistical significance does not tell us whether the differences are important from a practical perspective. Several measure of effect size have been proposed, all of which differ in terms of how biased they are. η (eta-squared) or the correlation ratio is one of the oldest measure of effect size. It represents the percentage of total variability that can be accounted for by differences in the grouping variable or the percentage by with the error variability (i.e. treatment variability) is reduced by considering group membership. This is done by calculating the ratio of between and total Specifically: η between total 73.5 For our previous example, η. 55, meaning 55% of the variation in ratings can 133.44 be accounted for by differences in the independent variable (i.e. the groups). This effect size measure is biased upwards, meaning it is larger than would be expected if it were to have been calculated from the population, rather than estimated from the sample. An alternative effect size measure to η is ω (omega-squared). It also measures the percentage of total variability that can be accounted for by between group variability but does so by using values, rather than values, thereby making use of sample size information. Specifically, for a fixed effect 4 ANOVA: ω between ( k 1) total + 73.5 (3 1)(1.94) 69.37 For our previous example, ω. 51. 133.44 + 1.94 135.38 This measure of effect size has been found to be less biased than η. Note that it is smaller for what we obtained for η. 4 Note that this measure of effect size is computed slightly differently for a random effects ANOVA model and that the formula for a random effects ANOVA model is not presented here.

6 Estimating power for ANOVA is a straightforward extension of how power was estimated for the t-test. We simply use different notation, and different tables. Moreover, we assume equal sample sizes in each group, which is the optimal situation. In an ANOVA context, φ is comparable to d in the independent t-test context, and separates out the effect size from the sample size. However, we need to incorporate the fact that we are using variance estimates in the ANOVA context. Specifically, φ ( µ ) µ / K So, if we were to assume that the population values correspond exactly to what we obtained in our example (unlikely as this may be) then φ [(4.55 4.3) + (5.9 4.3) 1.94 + (.36 4.3) ]/ 3.143 1.10 1.94 Furthermore, in an ANOVA context, φ is comparable to δ in the independent t-test context, in that it incorporates sample size to allow us to determine how large of a sample we need to detect meaningful differences, from a practical perspective. However, even though we may wind up with unequal sample sizes in our group we calculate power based on the assumption of equal sample sizes. Specifically, φ φ n where n the number of subects in each group So, if we were to assume that we expected 1 subects in each of our groups in our example then: φ φ n 1.1 1 3.81 In an ANOVA context we can use to the non-centrality parameter for the F distribution, which is the mean of the F-distribution if the null hypothesis is false, with K 1 and N K df for the numerator and denominator, respectively. For our example, we will use an estimate corresponding to φ 3.0, because our table in the book does not go any higher and we will compare it to the non-centrality parameter with df for the numerator and 30 df for the denominator (because our book does not have very fine gradiations for df in the denominator. Using the table in the book we find that β.03 if we want to conduct our test at α.01. Therefore, since Power 1 - β the power of the experiment we ran was approximately.97.