Keppel, G. & Wickens, T. D. Design and Analysis Chapter 6: Simultaneous Comparisons and the Control of Type I Errors

Similar documents
Hypothesis testing. Null and alternative hypotheses

Determining the sample size

Lesson 15 ANOVA (analysis of variance)

Confidence Intervals for One Mean

Lesson 17 Pearson s Correlation Coefficient

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Output Analysis (2, Chapters 10 &11 Law)

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

PSYCHOLOGICAL STATISTICS

I. Chi-squared Distributions

5: Introduction to Estimation

1. C. The formula for the confidence interval for a population mean is: x t, which was

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

1 Computing the Standard Deviation of Sample Means

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011


The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Mathematical goals. Starting points. Materials required. Time needed

OMG! Excessive Texting Tied to Risky Teen Behaviors

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

1 Correlation and Regression Analysis

Maximum Likelihood Estimators.

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

LECTURE 13: Cross-validation

Quadrat Sampling in Population Ecology

Properties of MLE: consistency, asymptotic normality. Fisher information.

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Conservative treatment:

Soving Recurrence Relations

Measures of Spread and Boxplots Discrete Math, Section 9.4

Modified Line Search Method for Global Optimization

CHAPTER 3 THE TIME VALUE OF MONEY

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Practice Problems for Test 3

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

3 Basic Definitions of Probability Theory

Sequences and Series

Math C067 Sampling Distributions

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

One-sample test of proportions

Chapter 7: Confidence Interval and Sample Size

Predictive Modeling Data. in the ACT Electronic Student Record

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 14 Nonparametric Statistics

Confidence intervals and hypothesis tests

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

How To Solve The Homewor Problem Beautifully

Statistical inference: example 1. Inferential Statistics

Elementary Theory of Russian Roulette

Section 11.3: The Integral Test

CHAPTER 3 DIGITAL CODING OF SIGNALS

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Hypergeometric Distributions

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Estimating Probability Distributions by Observing Betting Practices

Professional Networking

The Forgotten Middle. research readiness results. Executive Summary

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

GOOD PRACTICE CHECKLIST FOR INTERPRETERS WORKING WITH DOMESTIC VIOLENCE SITUATIONS

A probabilistic proof of a binomial identity

Chapter 7 Methods of Finding Estimators

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Learning objectives. Duc K. Nguyen - Corporate Finance 21/10/2014

7.1 Finding Rational Solutions of Polynomial Equations

Pre-Suit Collection Strategies

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Best of security and convenience

Flood Emergency Response Plan

Now here is the important step

Incremental calculation of weighted mean and variance

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Department of Computer Science, University of Otago

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

NATIONAL SENIOR CERTIFICATE GRADE 12

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

summary of cover CONTRACT WORKS INSURANCE

Topic 5: Confidence Intervals (Chapter 9)

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

A Recursive Formula for Moments of a Binomial Distribution

Theorems About Power Series

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Building Blocks Problem Related to Harmonic Series

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Transcription:

Keppel, G. & Wickes, T. D. Desig ad Aalysis Chapter 6: Simultaeous Comparisos ad the Cotrol of Type I Errors You should desig your research with specific questios i mid, which you the test with specific aalyses. However, your desig will ofte led itself to additioal aalyses, which may well allow you to lear more about the operatio of the variables i questio. These additioal aalyses come with the burde that you ll have a greater chace of makig a Type I error amog the additioal aalyses. This chapter covers meas of cotrollig Type I error amog these simultaeous comparisos. 6.1 Research Questios ad Type I Error The family of tests is a set of tests you ited to compute to address a set of research questios. The familywise Type I error rate (α FW ) is the probability of makig at least oe Type I error i the family of tests whe all H 0 are true. Whe you cosider the huge set of possible post hoc tests oe might compute, the you are cosiderig the experimetwise error rate (α EW ). Needless to say, it will typically be the case that α FW < α EW. With a per-compariso error rate (α), you ca compute the familywise Type I error rate for a umber of comparisos (c) as: α FW = 1 (1 α) c (6.1) Thus, usig K&W51 as a example, if you iteded to compute three comparisos usig α =.05 for each compariso, your α FW would be.1. Though ot as accurate as the above formula, for a quick ad dirty estimate of α FW you could simply use cα (which i this case would give you a estimate of.15). Of course, as the umber of comparisos grows, so does α FW. To covice yourself of this relatioship betwee c ad α FW, compute α FW for the umber of comparisos idicated below: Number of Comparisos (c) 5 6 α FW The formula for computig α FW oly works for orthogoal comparisos (i.e., assumed idepedece), but α FW also icreases with a icreasig umber of oorthogoal comparisos. Thus, because for K&W51 there are oly 3 orthogoal comparisos, the estimates of α FW above are ot accurate, though they still make the poit that α FW will icrease with icreasig umbers of comparisos. Decreasig your per-compariso error rate (α) will also serve to decrease your α FW. K&W distiguish the types of questios that experimeters might ask of their data. If the questios are the relatively small umber (e.g., rarely more tha a - 1) of primary questios, K&W 6-1

K&W suggest that o adjustmet of the per-compariso error rate (α) is ecessary. I m ot sure that joural editors would agree with this suggestio. It s much more typical that oe would wat to coduct a set of comparisos computed to uderstad the omibus ANOVA. Sometimes the umber of these comparisos is quite limited. Sometimes you wat to compute all possible simple pairwise comparisos. Ad sometimes you may be iterested i explorig a fairly large set of simple ad complex comparisos. The approach for cotrollig α FW varies for the differet situatios. Especially whe the umber of tests might be fairly large, it makes sese to adopt a α FW that is greater tha.05 (e.g.,.10). Keep i mid, however, that the guidelies for choosig α FW, or choosig a strategy for cotrollig Type I error i such simultaeous comparisos are ot rigid ad uiversally agreed o. 6. Plaed Comparisos OK, we ll discuss plaed comparisos, but keep i mid that joural editors might ot trust that you ve actually plaed the comparisos i advace. My advice would be to treat all comparisos as post hoc at least util you ve achieved the sort of stature i the disciplie that buys you some slack from editors. J Plaed comparisos must be specified i the iitial desig of a experimet. They are essetial ad pivotal tests ot a pla to coduct a fishig expeditio. For clearly plaed tests, familywise error correctio is geerally deemed uecessary. The comparisos that you choose to compute should be drive by theoretical cocers, rather tha cocers about orthogoality. However, orthogoal comparisos should be used because they keep hypotheses logically ad coceptually separate, makig them easier to iterpret. Noorthogoal comparisos must be iterpreted with care because of the difficulty of makig ifereces (iterpretig the outcomes). For istace, i a earlier editio, Keppel wodered, If we reject the ull hypothesis for two oorthogoal comparisos, which compariso represets the true reaso for the observed differeces? Do t allow yourself to be tempted ito computig a large umber of plaed comparisos. For a experimet with a levels, there are 1 + ((3 a - 1) / ) a comparisos (simple pairwise ad complex) possible. Please, do t ever compute all possible comparisos! If you thik carefully about your research, a much smaller set of plaed comparisos would be reasoable. A commo suggestio for multiple plaed comparisos is to coduct up to a - 1 comparisos with each compariso coducted at α =.05. The implicatio of this suggestio is that people are willig to tolerate a familywise error rate of (a - 1)(α). Thus, i a experimet with 5 levels, you could comfortably compute plaed comparisos with each compariso tested usig α =.05, for a familywise error rate of ~.19. As the umber of plaed comparisos becomes larger tha a - 1, cosider usig a correctio for α FW (e.g., Sidák-Boferroi procedure). K&W 6-

6.3 Restricted Sets of Cotrasts The Boferroi Procedure The most widely applicable familywise cotrol procedure for small families is the Boferroi correctio. The Boferroi iequality (α FW < c α) states, The familywise error rate is always less tha the sum of the per-compariso error rates of the idividual tests. Thus, to esure that our α FW is kept to a certai level, we could choose that level (e.g..05 or.10 depedig o our preferece) ad the divide that value by the umber of comparisos we wish to compute. Assumig that you are comfortable with α FW =.10 ad you are about to compute 5 comparisos, you would treat comparisos as sigificat if they occur with p α (the per-compariso rate), which would be.0 here. Give that SPSS prits out a t ad its associated p-value whe you ask it to compute cotrasts, you d be able to assess the sigificace of the t statistic by comparig it to your Boferroi per-compariso rate. For had computatio of such tests, you occasioally eed to compute a critical value of t or F for a per-compariso error rate that is ot foud i the table of critical values of F (A.1). For example, usig K&W51, suppose that you wated to compute 6 simple pairwise comparisos (e.g., vs. 1, vs. 0, etc.). If you were comfortable with α FW =.10, your percompariso error rate (α) would be.0167. (If you preferred α FW =.05, your per-compariso error rate would be.008.) Let s presume that we re usig α =.0167. Give homogeeity of variace (a topic that arises i Ch. 7) for K&W51, we would use the overall error term (MS S/A = 150.58) for ay compariso. Thus, df Error = 1 ad df Compariso is always 1. I Table A.1, for those df we see the followig tabled α values: α F Crit.100 3.18.050.75.05 6.55.010 9.33.001 18.6 Although our α is ot tabled, we ca see that F Crit for our α would be less tha 9.33 ad greater tha 6.55. (Ad, of course, we could determie t Crit values by takig the square root of the tabled F Crit values.) Suppose that you compute F Compariso = 10.0 (or ay F 9.33). You would coclude that the two groups came from populatios with differet meas (reject H 0 ). Suppose, istead, that you compute F Compariso = 6.0 (or ay F 6.55). You would coclude that you had isufficiet evidece to claim that the two groups came from populatios with differet meas (retai H 0 ). The tricky stage arises if your F Compariso = 9.0. To assess the sigificace of this outcome, you eed to actually compute F Crit for α =.0167. You ca always use the formula below to determie the F for a give level of α. t = z + z 3 + z ()(df Error ) Do t eve ask what that complex formula meas! However, what it does is clear. Ultimately, it will geerate F Crit for ay α. You eed to keep two poits i mid. First, the z i the formula eeds to be two-tailed, so you eed to look up.0167/ =.008 i the tail of the K&W 6-3

uit ormal distributio. Secod, you re geeratig a t value, so you eed to square it to get a F Crit. I this example, z =.39. So t =.79 ad F = 7.79. Thus, usig the Boferroi procedure if you were iterested i computig 6 simple pairwise comparisos o the K&W51 data set (ad usig α FW =.10), to be sigificat each F Compariso eeds to be 7.79. Alteratively, you ca avoid the formula etirely ad use a web-based calculator, such as: http://www.graphpad.com/quickcalcs/statratio1.cfm J The Sidák-Boferroi Procedure This procedure is a modified Boferroi procedure that results i a bit more power, so it is preferred to the straight Boferroi procedure. It makes use of the followig equatio: α = 1 (1 α FW ) 1/c (6.5) Because of the preferece for this procedure, K&W provide useful tables i A.. The tables illustrate iformatio for α FW =.0, α FW =.10, α FW =.05, ad α FW =.01 o pages 578-581. Keepig with the above example, usig K&W51 ad 6 comparisos with α FW =.10, we would look o p. 579. The probability associated with 6 comparisos would be.0171 (which you could also obtai by substitutig.10 for α FW ad 6 for c i the above formula, but the table is easier). Note, of course, that if F Compariso yielded p =.0171 it would be sigificat with the Sidák-Boferroi procedure, but ot with the Boferroi procedure (where it would have to be.0167). Not a huge differece, but the Sidák-Boferroi procedure provides a bit more power. You ca also compare meas by computig a critical mea differece accordig to the formula below, as applied to the above example: D S B = t S B MS Error =.76 (150.58) = 3.9 Thus, oe possible compariso might be hr vs. 1hr. That differece would ot be sigificat (37.75 6.5 = 11.5). However, i comparig hr vs. 0hr we would fid a sigificat differece (57.5 6.5 = 31). Duett s Test If you have a cotrol group to which you wish to compare the other treatmets i your study, the the Duett test is appropriate. Oce agai, the most geeral approach is to compute F Comp ad the compare that F ratio to a F Crit value. For the Duett test, the F Crit is F D = (t D ) You look up the value of t D i Table A.5 (pp. 58-585). To do so, you would agai eed to decide o the level of α FW you d like to use ad the you d eed to kow how may coditios are ivolved i your experimet (cotrol plus experimetal groups). For the K&W51 example, let s assume that the hr group was a cotrol group (o sleep deprivatio) K&W 6-

to which you d like to compare each of the other groups. Thus, there would be a total of four groups. With α FW =.10 ad groups, t D =.9. Thus, you d compare each F Compariso agaist F D = 5.. You could also take a critical mea differece approach with the Duett test: D Duett = t Duett MS Error =.9 (150.58) = 19.86 (6.6) Note that the critical mea differece here is less tha that foud with the Sidák-Boferroi procedure, idicatig that the Duett test is more powerful. Noetheless, I would bet that you rarely fid yourself i a situatio where you ll wat to compute the Duett test. 6. Pairwise Comparisos Tukey s HSD Procedure If you are iterested i comparig every pair of meas (simple pairwise comparisos), you might use the Tukey HSD (Hoestly Sigificat Differece) Procedure. Usig this procedure requires you to use the Studetized Rage Statistic (q) foud i Appedix A.6 (pp. 586-589). Agai, you ca first compute F Compariso, after which you would compare that value to a critical value obtaied from the tables, which you the square. For the example we ve bee usig (K&W51, α FW =.10, 6 pairwise comparisos): F HSD = q = 3.6 = 6.55 Alteratively, you could compute a critical mea differece. For the example we ve bee usig, you d fid: D HSD = q a MS Error = 3.6 150.58 =. (6.7) Note that this procedure is more liberal tha the Sidák-Boferroi procedure for what are essetially the same 6 comparisos (D S-B = 3.9). K&W suggest comparig your differeces amog meas i a matrix. For K&W51, it would look like this: a 1 (6.5) a (37.75) a 3 (57.5) a (61.75) a 1 (6.5) ----- a (37.75) 11.5 ----- a 3 (57.5) 31.0 19.75 ----- a (61.75) 35.5.0.5 ----- This table allows you to see that there are three sigificat comparisos. You caot use this critical mea differece approach (Formula 6.7) whe you have uequal sample sizes (though the formula ca be modified as below) or should you take this approach whe there is heterogeeity of variace. K&W 6-5

Whe sample sizes are differet, replace i the formula with ñ, computed as: = 1 1 + 1 This value is actually a special kid of mea (harmoic). Thus, if oe group had = 10 ad the other group had = 0, the ñ for Formula would be 13.33. If you have reaso to suspect heterogeeity of variace (as discussed i Chapter 7), the formula would become: " s $ 1 # D HSD = q a + s 1 % ' & (6.8) The df you d use to look up q emerge from a complex formula (7.13), so we ll retur to this issue oce we ve discussed the implicatios of heterogeeity of variace. The Fisher-Hayter Procedure Tukey s procedure is the simplest way to test the pairwise differeces ad is the oe that is most applicable to ay patter of effects. Hmmm, so why are K&W talkig about alteratives? I geeral, some people were cocered that HSD is too coservative, so they wated to derive more powerful simple pairwise compariso procedures. The Fisher-Hayter procedure requires that you first compute a overall ANOVA ad reject H 0. If you are able to reject H 0 i the overall ANOVA, the use the Studetized Rage Statistic (q) foud i Appedix A.6 (pp. 586-589). For HSD, you d look up q for a treatmet meas. However, for Fisher-Hayter, you d look up q for a-1 treatmet meas. Otherwise, the formulas are the same, as see below: D FH = q a 1 MS Error = 3. 150.58 = 19.63 (6.9) I ve filled i the values that you d use for K&W51. The critical mea differece here is smaller tha that foud for HSD, so this test is more powerful. The Fisher-Hayter procedure provides excellet cotrol of Type I error, a fact that has bee demostrated i several simulatio studies...we suggest that you use this procedure, particularly whe makig calculatios by had. The Newma-Keuls ad Related Procedures This procedure is sometimes referred to as Studet-Newma-Keuls (SNK). K&W describe the procedure for computatio of SNK. However, my advice (ad theirs) would be to use Fisher-Hayter or Tukey s HSD. You ll see SNK used, but ofte by people who leared to use it log ago ad cotiue to do so, eve after better approaches have bee idetified. K&W 6-6

6.5 Post-Hoc Error Correctio You may be iclied to compute a whole host of comparisos, icludig some complex comparisos. If you re doig so i a exploratory fashio ( throw it agaist the wall ad see what sticks ), you are asked to pay some pealty by usig a very coservative test. Scheffé s Procedure The Scheffé test is the most coservative post hoc test. Basically, it cotrols for the FW error that would occur if you were to coduct every possible compariso. Ideally, a perso would be coductig far fewer comparisos tha that! Compute the Scheffé test by first computig the F Comp. I the presece of heterogeeity of variace, you would use separate variaces for the deomiator. I the presece of homogeeity of variace, you would use the pooled variace for the deomiator. The, test your F Comp for sigificace by comparig it to the F Crit Scheffé (F S ), where F S = (a - 1) F(df A, df S/A ) (6.11) Thus, for comparisos i K&W51, F S = (3)(3.9) = 10.7. Basically, I thik that you should avoid usig this procedure. Recommedatios ad Guidelies (from Keppel 3 rd ) As i all thigs, cotrollig for iflated chace of familywise Type I error calls for moderatio. The Sidák-Boferroi procedure seems a reasoable approach for smaller sets of comparisos. Tukey s HSD (Tukey-Kramer) or the Fisher-Hayter procedure seem to be reasoable for simple pairwise comparisos. Keep i mid that FW error rate may ot be as serious as it might appear to be. As Keppel otes, assumig that H 0 is true, if you replicated a experimet 000 times ad coducted the same 5 comparisos after each experimet, you would expect that a Type I error would occur i 500 experimets (.5 x 000). However, i those 500 experimets, oly 10% (50) would cotai more tha oe Type I error i the 5 comparisos. Fewer still would have more tha two. Furthermore, keep i mid that most experimets reflect some treatmet effect (i.e., H 0 is false). That is, you are rarely dealig with a situatio i which the treatmet has o effect. Keppel argues for the value of plaed comparisos (as does G. Loftus, 1996). Although I am i coceptual agreemet, I worry about the practicalities facig a researcher who chooses to report plaed comparisos. A joural editor may be sympathetic, but I worry about what a usympathetic editor/reviewer might say about plaed comparisos. Keppel does suggest that replicatios are importat especially as a meas of offsettig a perceived iflated Type I error rate. Oce agai, i a ideal world I d agree. However, a uteured researcher would probably beefit from doig publishable research, ad jourals are ot yet willig to publish replicatios. We must recogize that the decisio to correct for a iflated FW Type I error rate is a decisio to icrease the chace of makig a Type II error (i.e., a decrease i power). Thus, if you have a set of plaed comparisos i mid, you might well estimate your eeded sample size o the basis of the plaed comparisos, rather tha o the basis of the overall ANOVA. K&W 6-7

Usig SPSS for Comparisos If you choose to use Aalyze->Compare Meas->Oe-way ANOVA for aalysis, you ll first see the widow below left. If you click o the Post Hoc butto, you ll see the widow below right. As you ca see, may of the procedures that K&W describe are available i SPSS. However, the Fisher-Hayter procedure is t oe of the optios. Thus, if you choose to use this very reasoable post hoc procedure, you ll eed to do so outside of SPSS. Lookig at the similar Tukey s HSD for the KW51 data set, you d see the output below. errors Tukey HSD Multiple Comparisos (I) hrsdep (J) hrsdep Mea Differece 95% Cofidece Iterval (I-J) Std. Error Sig. Lower Boud Upper Boud 1-11.50 8.673.58-37.00 1.50 dimesio3 3-31.000 * 8.673.017-56.75-5.5-35.50 * 8.673.007-61.00-9.50 1 11.50 8.673.58-1.50 37.00 dimesio3 3-19.750 8.673.158-5.50 6.00 dimesio 3 -.000 8.673.071-9.75 1.75 1 31.000 * 8.673.017 5.5 56.75 dimesio3 19.750 8.673.158-6.00 5.50 -.50 8.673.960-30.00 1.50 1 35.50 * 8.673.007 9.50 61.00 dimesio3.000 8.673.071-1.75 9.75 *. The mea differece is sigificat at the 0.05 level. 3.50 8.673.960-1.50 30.00 K&W 6-8