Aus: Statnotes: Topics in Multivariate Analysis, by G. David Garson (Zugriff am

Similar documents
Multiple-Comparison Procedures

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Chapter 5 Analysis of variance SPSS Analysis of variance

SPSS and AMOS. Miss Brenda Lee 2:00p.m. 6:00p.m. 24 th July, 2015 The Open University of Hong Kong

Simple Tricks for Using SPSS for Windows

SPSS Advanced Statistics 17.0

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

IBM SPSS Advanced Statistics 20

13: Additional ANOVA Topics. Post hoc Comparisons

Section 13, Part 1 ANOVA. Analysis Of Variance

Descriptive Statistics

UNDERSTANDING THE TWO-WAY ANOVA

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

How To Check For Differences In The One Way Anova

1 Overview. Fisher s Least Significant Difference (LSD) Test. Lynne J. Williams Hervé Abdi

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

IBM SPSS Advanced Statistics 22

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

SPSS Explore procedure

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Analysis of Variance ANOVA

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Analysis of Variance. MINITAB User s Guide 2 3-1

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

THE KRUSKAL WALLLIS TEST

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

The Statistics Tutor s Quick Guide to

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Lecture Notes #3: Contrasts and Post Hoc Tests 3-1

Comparing Means in Two Populations

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

SPSS Tests for Versions 9 to 13

Study Guide for the Final Exam

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Linear Models in STATA and ANOVA

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

1.5 Oneway Analysis of Variance

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

CHAPTER 12 MULTIPLE COMPARISONS AMONG

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

ABSORBENCY OF PAPER TOWELS

8. Comparing Means Using One Way ANOVA

NCSS Statistical Software

January 26, 2009 The Faculty Center for Teaching and Learning

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Multiple samples: Pairwise comparisons and categorical outcomes

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Independent t- Test (Comparing Two Means)

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Association Between Variables

One-Way Analysis of Variance

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) is rejected, we need a method to determine which means are significantly different from the others.

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Concepts of Experimental Design

T-test & factor analysis

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Projects Involving Statistics (& SPSS)

Data Analysis in SPSS. February 21, If you wish to cite the contents of this document, the APA reference for them would be

individualdifferences

Chapter 2 Probability Topics SPSS T tests

II. DISTRIBUTIONS distribution normal distribution. standard scores

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

The Dummy s Guide to Data Analysis Using SPSS

Principles of Hypothesis Testing for Public Health

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

An introduction to IBM SPSS Statistics

Chapter 7. One-way ANOVA

Chapter 7 Section 7.1: Inference for the Mean of a Population

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Research Methods & Experimental Design

Multivariate Analysis of Variance (MANOVA)

Additional sources Compilation of sources:

The Bonferonni and Šidák Corrections for Multiple Comparisons

SPSS 3: COMPARING MEANS

Multiple Linear Regression

Mixed 2 x 3 ANOVA. Notes

Multivariate Analysis of Variance (MANOVA)

Study Design and Statistical Analysis

Hypothesis testing - Steps

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Reporting Statistics in Psychology

PSYCHOLOGY 320L Problem Set #3: One-Way ANOVA and Analytical Comparisons

SPSS Introduction. Yi Li

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

CALCULATIONS & STATISTICS

Pearson's Correlation Tests

Main Effects and Interactions

Transcription:

Aus: Statnotes: Topics in Multivariate Analysis, by G. David Garson http://faculty.chass.ncsu.edu/garson/pa765/anova.htm (Zugriff am 20.10.2010) Planned multiple comparison t-tests, also just called "multiple comparison tests". In oneway ANOVA for confirmatory research, when difference of means tests are pre-planned and not just post-hoc, as when a researcher plans to compare each treatment group mean with the mean of the control group, one may apply a simple t-test, a Bonferroni-adjusted t-test, the Sidak test, or Dunnett's test. The last two are also variants of the t-test. The t-test is thus a test of significance of the difference in the means of a single interval dependent, for the case of two groups formed by a categorical independent. The difference between planned multiple comparison tests discussed in this section and posthoc multiple comparison tests discussed in the next section is one of power, not purpose. Some, including SPSS, lump all the tests together as "post hoc tests", as illustrated below. This figure shows the SPSS post hoc tests dialog after the Post Hoc button is pressed in the GLM Univariate dialog. (There is a similar dialog when Analyze, Compare Means, One- Way ANOVA is chosen, invoking the SPSS ONEWAY procedure, which the GLM procedure has superceded). The essential difference is that the planned multiple comparison tests in this section are based on the t-test, which generally has more power than the post-hoc tests listed in the next section. Warning! The model, discussed above, will make a difference for multiple comparison tests. A factor (ex., race) may display different multiple comparison results depending on what other factors are in the model. Covariates cannot be in the model at all for these tests to be done. Interactions may be in the model, but multiple comparison tests are not available to test them. Also note that all these t-tests are subject to the equality of variances assumption and therefore the data must meet Levene's test, discussed below. Finally, note that the significance level (.05 is default) may be set using the Options button off the main GLM dialog.

1. Simple t-test difference of means. The simple t-test is recommended when the researcher has a single planned comparison (a comparison of means specified beforehand on the basis of à priori theory). In SPSS, for One-Way ANOVA, select Analyze, Compare Means, One-Way ANOVA; click Post Hoc; select the multiple comparison test you want. If the Bonferroni test is requested, SPSS will print out a table of "Multiple Comparisons" giving the mean difference in the dependent variable between any two groups (ex., differences in test scores for any two educational groups). The significance of this difference is also printed, and an asterisk is printed next to differences significant at the.05 level or better. SPSS supports the Bonferroni test in its GLM and UNIANOVA procedure. SPSS. A simple t-test, with or without Bonferroni adjustment, may be obtained by selecting Statistics, Compare Means, One-Way ANOVA. Example. 2. Bonferroni-adjusted t-test. Also called the Dunn test, Bonferroni-adjusted t-tests are used when there are planned multiple comparisons of means. As a general principle, when comparisons of group means are selected on a post hoc basis simply because they are large, there is an expected increase in variability for which the researcher must compensate by applying a more conservative test -- otherwise, the likelihood of Type I errors will be substantial. The Bonferroni adjustment is perhaps the most common approach to making post-hoc significance tests more conservative. The Bonferroni method applies the simple t-test, but then adjusts the significance level by multiplying by the number of comparisons being made. For instance, a finding of.01 significance for 9 comparisons becomes.09. This is equivalent to saying that if the target alpha significance level is.05, then the t-test must show alpha/9 (ex.,.05/9 =.0056) or lower for a finding of significance to be made. Bonferroni-adjusted multiple t-tests are usually employed only when there are few comparisons, as with many it quickly becomes practically impossible to show

significance. If the independents formed 8 groups there would be 8!/6!2! = 28 comparisons and if one used the.05 significance level, one would expect at least one of the comparisons to generate a false positive (thinking you had a relationship when you did not). Note this adjustment may be applied to F-tests as well as t-tests. That is, it can handle nonpairwise as well as pairwise comparisons. The Bonferroni-adjusted t-test imposes an extremely small alpha significance level as the number of comparisons becomes large. That is, this method is not recommended when the number of comparisons is large because the power of the test becomes low. Klockars and Sax (1986: 38-39) recommend using a simple.05 alpha rate when there are few comparisons, but using the more stringent Bonferroni-adjusted multiple t-test when the number of planned comparisons is greater than the number of degrees of freedom for between-groups mean square (which is k-1, where k is the number of groups). Nonetheless, researchers still try to limit the number of comparisons, trying to reduce the probability of Type II errors (accepting a false null hypothesis). This test is not recommended when the researcher wishes to perform all possible pairwise comparisons. By the Bonferroni test, the figure above shows whites are significantly different from blacks but not from "other" races, with respect to mean highest year of education completed (the dependent variable). 3. Sidak test. The Sidak test, also called the Dunn-Sidak test, is a variant on the Dunn or Bonferroni approach, using a t-test for pairwise multiple comparisons. The alpha significance level for multiple comparisons is adjusted to tighter (more accurate) bounds than for the Bonferroni test (Howell, 1997: 364). SPSS supports the Sidak test in its GLM and UNIANOVA procedures. In the figure above, the Sidak test shows the same pattern as the Bonferroni test. 4. Dunnett's test is a t-statistic which is used when the researcher wishes to compare each treatment group mean with the mean of the control group, and for this purpose has better power than alternative tests. Dunnett's test does not require a prior finding of significance in the overall F test "as it controls the familywise error rate independently" (Cardinal & Aitken, 2005: 89). This test, based on a 1955 article by Dunnett, is not to be confused with Dunnett's C or Dunnett's T3, discussed below. In the example illustrated above, Dunnett's test leaves out the last category ("other"

race) as the reference category and shows whites are not significantly different from "other" but blacks are. HSU's multiple comparison with the best (MCB) test. HSU's MCB is an adaptation of Dunnett's method for the situation where the researcher wishes to compare the mean of each level with the best level, as in a treatment experiment where the best treatment is known. In such analyses the purpose is often to identify alternative treatments which are not significantly different from the best treatment but which may cost less or have other desirable features. HSU's MCB is supported by SAS JMP but not SPSS. HSU's unconstrained multiple comparison with the best (UMCB) test is a variant which takes each treatment group in turn as a possble best treatment and compares all others to it. Post-hoc multiple comparison tests, also just called "post-hoc tests," are used in exploratory research to assess which group means differ from which others, after the overall F test has demonstrated at least one difference exists. If the F test establishes that there is an effect on the dependent variable, the researcher then proceeds to determine just which group means differ significantly from others. That is, post-hoc tests are used when the researcher is exploring differences, not limited by ones specified in advance on the basis of theory. These tests may also be used for confirmatory research but the t-test-based tests in the previous section are generally preferred. In comparing group means on a post-hoc basis, one is comparing the means on the dependent variable for each of the k groups formed by the categories of the independent factor(s). The possible number of comparisons is k(k-1)/2. Multiple comparisons help specify the exact nature of the overall effect determined by the F test. However, note that post hoc tests do not control for the levels of other factors or for covariates (that is, interaction and control effects are not taken into account). Findings of significance or nonsignificance between factor levels must be understood in the context of full ANOVA F- test findings, not just post hoc tests, which are subordinant to the overall F test. Note the model cannot contain covariates when employing these tests. Computation. The q-statistic, also called the q range statistic or the Studentized range statistic, is commonly used in coefficients for post-hoc multiple comparisons, though some post hoc tests use the t statistic. In contrast to the planned comparison t-test, coefficients based on the q-statistic, are commonly used for post-hoc comparisons - that is, when the researcher wishes to explore the data to uncover large differences, without limiting investigation by à priori theory). Both the q and t statistics use the difference of means in the numerator, but where the t statistic uses the standard error of difference between the means in the denominator, q uses the standard error of the mean. Consequently, where the t test tests the difference between two means, the q-statistic tests the probability that the largest mean and smallest mean among the k groups formed by the categories of the independent(s) were sampled from the same population. If the q-statistic computed for the two sample means is not as large as the criterion q value in a table of critical q values, then the researcher cannot reject the null hypothesis that the groups do not differ at the given alpha significance level (usually.05). If the null hypothesis is not rejected for the largest compared to smallest group means, it follows that all intermediate groups are also drawn from the same population -- so the q-statistic is thus also a test of homogeneity for all k groups formed by the independent variable(s). Output formats: pairwise vs. multiple range. In pairwise comparisons tests, output is produced similar to the Bonferroni and Sidk tests above, for the LSD, Games-Howell,

Tamhane's T2 and T3, Dunnett's C, and Dunnett's T3 tests. Homogeneous subsets for range tests are provided for S-N-K, Tukey's b, Duncan, R-E-G-W F, R-E-G-W Q, and Waller. Some tests are of both types: Tukey's honestly significant difference test, Hochberg's GT2, Gabriel's test, and Scheff?s test. Warning! The model, discussed above, will make a difference for post hoc tests. A factor (ex., race) may display different multiple comparison results depending on what other factors are in the model. Covariates cannot be in the model at all for these tests to be done. Interactions may be in the model, but multiple comparison tests are not available to test them. Also note that all the post-hoc tests are subject to the equality of variances assumption and therefore the data must meet Levene's test, discussed below, with the exception of Tamhane's T2, Dunnett's T3, Games-Howell, and Dunnett's C, all of which are tailored for data where equal variances cannot be assumed. Finally, note that the significance level (.05 is default) may be set using the Options button off the main GLM dialog. Tests assuming equal variances 1. Least significant difference (LSD) test. This test, also called the Fisher's LSD, the protected LSD, or the protected t test, is based on the t-statistic and thus can be considered a form of t-test. "Protected" means the LSD test should be applied only after the overall F test is shown to be significant. LSD compares all possible pairs of means after the F-test rejects the null hypothesis that groups do not differ (this is a requirement of the test). (Note some computer packages wrongly report LSD t-test coefficients for comparisons even if the F test leads to acceptance of then null hypothesis). It can handle both pairwise and nonpairwise comparisons and does not require equal sample sizes. LSD is the most liberal of the post-hoc tests (it is most likely to reject the null hypothesis in favor of finding groups do differ). It controls the experimentwise Type I error rate at a selected alpha level (typically 5%), but only for the omnibus (overall) test of the null hypothesis. LSD allows higher Type I errors for the partial null hypotheses involved in the comparisons. Toothaker (1993: 42) recommends against any use of LSD on the grounds that it has poor control of experimentwise alpha significance, and better alternatives exist such as Shaffer-Ryan, discussed below. Others, such as Cardinal & Aitken (2005: 86) recommend its use only for factors with three levels. However, the LSD test is the default in SPSS for pairwise comparisons in its GLM or UNIANOVA procedures. As illustrated below, the LSD test is interpreted in the same manner as the Bonferroni test above and for this example yields the same substantive results: whites differ significantly from blacks but not other races on mean highest school year completed.

The Fisher-Hayter test is a modification of the LSD test meant to control for the liberal alpha significance level allowed by LSD. It is used when all pairwise comparisons are done post-hoc, but power may be low for fewer comparisons. See Toothaker (1993: 43-44). SPSS does not support the Fisher-Hayter test. 2. Tukey's test, a.k.a. Tukey honestly significant difference (HSD) test: As illustrated below, the multiple comparisons table for the Tukey test displays all pairwise comparisions between groups, interpreted in the same way as for the Bonferroni test discussed above. The Tukey test is conservative when group sizes are unequal. It is often preferred when the number of groups is large precisely because it is a conservative pairwise comparison test, and researchers often prefer to be conservative when the large number of groups threatens to inflate Type I errors. HSD is the most conservative of the posthoc tests in that it is the most likely to accept the null hypothesis of no group differences. Some recommend it only when all pairwise comparisons are being tested. When all pairwise comparisons are being tested, the Tukey HSD test is more powerful than the Dunn test (Dunn may be more powerful for fewer than all comparisons). The Tukey HSD test is based on the q-statistic (the Studentized range distribution) and is limited to pairwise comparisons. Select "Tukey" on the SPSS Post Hoc dialog (Example).

3. Tukey-b test, a.k.a. Tukey's wholly significant difference (WSD) test, also shown above, is a less conservative version of Tukey's HSD test, also based on the q-statistic. The critical value of WSD (Tukey-b) is the mean of the corresponding value for the Tukey's HSD test and the Newman-Keuls test, discussed below. In the illustration above, note no "Sig" significance values is output in the range test table for Tukey-b. Rather, the table shows there are two significantly different homogenous subsets on highest year of school completed, with the first group being blacks and the second group being whites and other race. 4. S-N-K or Student-Newman-Keuls test. also called the Newman-Keuls test, is a little-used post-hoc comparison test of the range type, also based on the q- statistic, which is used to evaluate partial null hypotheses (hypotheses that all but g of the k means come from the same population). It is recommended for one-way balanced ANOVA designs when there are only three means to be compared (Cardinal & Aitken, 2005: 87). Let k = the number of groups formed by categories of the independent variable(s). First all combinations of k-1 means are tested, then k-2 groups, and so on until sets of 2 means are tested. As one is proceeding toward testing ever smaller sets, testing stops if an insignificant range is discovered (that is, if the q-statistic for the comparison of the highest and lowest mean in the set [the "stretch"] is not as great as the critical value of q for the number of groups in the set). Klockars and Sax (1986: 57) recommend the Student-Newman-Keuls test when the researcher wants to compare adjacent means (pairs adjacent to each other when all means are presented in rank order). Toothaker (1993: 29) recommends Newman-Keuls only when the number of groups to be compared equals 3, assuming one wants to control the comparison error rate at the

experimentwise alpha rate (ex.,.05), but states that the Ryan or Shaffer-Ryan, or the Fisher-Hayter tests are preferable (Toothaker, 1993: 46). The example below shows the same homogenous groups as in the Tukey-b test above. Duncan test. A range test somewhat similar to the S-N-K test and also not commonly used due to poor control (Cardinal & Aitken, 2005: 88). Illustrated further below. 5. Ryan test (REGWQ): This is the Ryan-Einot-Gabriel-Welsch multiple range test based on range and is the usual Ryan test, a modified Student-Newman- Keuls test adjusted so critical values decrease as stretch size (the range from highest to lowest mean in the set being considered) decreases. The Ryan test is more powerful than the S-N-K test or the Duncan multiple range test discussed below. It is considered a conservative test and is recommended for one-way balanced ANOVA designs and is not recommended for unbalanced designs. The result is that Ryan controls the experimentwise alpha rate at the desired level (ex.,.05) even when the number of groups exceeds 3, but at a cost of being less powerful (more chance of Type II errors) than Newman- Keuls. As with Newman-Keuls, Ryan is a step-down procedure such that one will not get to smaller stretch comparisons if the null hypothesis is accepted for larger stretches of which they are a subset. Toothaker (1993: 56) calls Ryan the "best choice" among tests supported by major statistical packages because maintains good alpha control (ex., better than Newman-Keuls) while having at least 75% of the power of the most powerful tests (ex., better than Tukey HSD). Cardinal and Aiken (2005: 87) consider the Ryan test a "good compromise" between the liberal Student-Newman-Keuls test and the conservative Tukey HSD test. For the same data, it comes to the same conclusion as illustrated below. 6. Ryan test (REGWF): This is the Ryan test based on the F statistic rather than range. It is a bit more powerful than REGWQ, though less common and more computationally intensive. Also a conservative test, it tends to come to the same substantive conclusions as ordinary Ryan test. REGWF is supported by

SPSS but not SAS. The Shaffer-Ryan test modifies the Ryan test. It is also a protected or step-down test, requiring the overall F test reject the null hypothesis first but uses slightly different critical values. To date, Shaffer-Ryan is not supported by SAS or SPSS, but it is recommended by Toothaker (1993: 55) as "one of the best multiple comparison tests in terms of power." 7. The Scheffé test is a widely-used range test which works by first requiring the overall F test of the null hypothesis be rejected. If the null hypothesis is not rejected overall, then it is not rejected for any comparison null hypothesis. If the overall null hypothesis is rejected, however, then F values are computed simultaneously for all possible comparison pairs and must be higher than an even larger critical value of F than for the overall F test described above. Let F be the critical value of F as used for the overall test. For the Scheffé test, the new, higher critical value, F', is (k-1)f. The Scheffé test can be used to analyze any linear combination of group means. Output, illustrated below, is similar to other range tests discussed above and for this example comes to the same conclusions.

While the Scheffé test has the advantage of maintaining an experimentwise. 05 significance level in the face of multiple comparisons, it does so at the cost of a loss in statistical power (more Type II errors may be made -- thinking you do not have a relationship when you do). That is, the Scheffé test is a very conservative one (more conservative than Dunn or Tukey, for ex.), not appropriate for planned comparisons but rather restricted to post hoc comparisons. Even for post hoc comparisons, the test is used for complex comparisons and is not recommended for pairwise comparisons due to "an unacceptably high level of Type II errors" (Brown and Melamed, 1990: 35). Toothaker (1993: 28) recommends the Scheffé test only for complex comparisons, or when the number of comparisons is large. The Scheffé test is low in power and thus not preferred for particular comparisons, but it can be used when one wishes to do all or a large number of comparisons. Tukey's HSD is preferred for making all pairwise comparisons among group means, and Scheffé for making all or a large number of other linear combinations of group means. 8. Hochberg GT2 test. A range test considered similar to Tukey's HSD but which is quite robust against violation of homogeneity of variances except when cell sizes are extremely unbalanced. It is generally less powerful than Tukey's HSD when factor cell sizes are not equal.

9. Gabriel test. A range test based on the Studentized maximux modulus test. The Gabriel test is similar to but more powerful than the Hochberg GT2 test when cell sizes are unequal, but it tends to display a liberal bias as cell sizes vary greatly. 10.Waller-Duncan test. A range test based on a Bayesian approach, making it different from other tests in this section. When factor cells are not equal, it uses the harmonic mean of the sample sizes. The kratio is specified by the researcher in advance in lieu of specifying an alpha significance level (ex.,. 05). The kratio is known as the Type 1/Type 2 error seriousness ratio. The default value is 100, which loosely corresponds to a.05 alpha level; kratio = 500 loosely corresponds to alpha = 1. Tests not assuming equal variances. If the model is a one-way ANOVA with only one factor and no covariates and no interactions, then four additional tests are available which do not require the usual ANOVA assumption of homogeneity of variances. 1. Tamhane's T2 test. Tamhane's T2 is a conservative test. It is considered more appropriate than Tukey's HSD when cell sizes are unequal and/or when homogeneity of variances is violated.

2. Games-Howell test. The Games-Howell test is a modified HSD test which is appropriate when the homogeneity of variances assumption is violated. It is designed for unequal variances and unequal sample sizes, and is based on the q-statistic distribution. Games-Howell is slightly less conservative than Tamhane's T2 and can be liberal when sample size is small and is recommended only when group sample sizes are greater than 5. Because Games-Howell is only slightly liberal and because it is more powerful than Dunnett's C or T3, it is recommended over these tests. Toothaker (1993: 66) recommends Games-Howell for the situation of unequal (or equal) sample sizes and unequal or unknown variances. 3. Dunnett's T3 test and Dunnett's C test. These tests might be used in lieu of Games-Howell when it is essential to maintain strict control over the alpha significance level across multiple tests, similar to the purpose of Bonferroni adjustments (ex., exactly.05 or better). 4. The Tukey-Kramer test: This test, described in Toothaker (1993: 60), who also gives an appendix with critical values, controls experimentwise alpha. Requires equal population variances. Toothaker (p. 66) recommends this test for the situation of equal variances but unequal sample sizes. In SPSS, if you ask for the Tukey test and sample sizes are unequal, you will get the Tukey- Kramer test, using the harmonic mean. Not supported by SPSS 5. The Miller-Winer test: Not recommended unless equal population variances are assured. Not supported by SPSS