Experimental Designs (revisited)

Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described when researchers talk to each other and this is how the initial analysis is conducted.) As defined in the chapter on experimental design (back in Part 1), a factor is an independent variable (i.e., some property, characteristic, or quality that can be manipulated) that is being used as a predictor or explainer of variance in the data analysis. In most cases, each specific value of the IV defines a level within the factor, but that doesn t have to be true, so we have two different labels. The way to keep these straight is to remember that an IV is created and exists when the experiment is being run; a factor is part of the analysis. Sometimes, for any of a variety of reasons, you can change your mind about the best way to approach the experiment between the time that you collected the data (and had levels of the IV) and when you conduct the analysis (and have levels of the factor). For example, sometimes we collapse two or more levels of an IV into one level of a factor. Experimental Designs (revisited) There are two manners in which experimental designs are described. The simple method only specifies the number of factors, as in one-way or two-way for experiments with one or two factors, respectively. The more complicated method specifies both the number of factors and the number of levels within each factor. For example, if an experiment involves two factors, one of which has two levels and the other of which has three levels, then the experiment is said to employ a two-by-three design. The number of numbers in this description tells you how many factors; each of the numbers tells you how many levels. I suggest that you use the more complicated manner in most situations. Note: it is traditional to list the factors from smallest to largest; thus, one would not often say three-by-two design, but you can if that really would be better. It is also a good habit to specify whether the factors are within- or between-subjects. If all of the factors are of the same sort, just append the label at the end of factors & levels description; e.g., two-by-three, between-subjects design or two-by-three, within-subjects design. If the factor types are mixed, append the compound modifier mixed-factor and then say which factor or factors are within subjects using the label repeated measures ; e.g., two-by-three, mixed-factor design, with repeated measures on the first factor if the two-level factor is within-subjects and the three-level factor is between-subjects. Note: be very careful to call these mixed-factor designs; do not, for example, call them mixed-effect designs, because those are a very different thing. Note, also, that there are other ways to say these things. For example, factorial is another label for a completely between-subjects design. One-way, Between-subjects ANOVA The easiest way to describe the theory behind ANOVA is to talk about a one-way (i.e., one-factor), between-subjects experiment. In fact, maybe because of its simplicity, SPSS lists this very specific type of analysis separately from all other forms of ANOVA; SPSS puts one-way, between-subject ANOVA with the t-tests, under Analyze... Compare Means... But don t be fooled by where it appears in the menus; this is an ANOVA, not a t-test. (Plus, I don t suggest using this version; use Analyze... General Linear Model Univariate, instead, for several reasons.)

For the purposes of discussion, imagine that we have conducted an experiment concerning motion-sickness with three groups of subjects. One group was in the control condition, which we ll call C; nothing was given or done to these subjects other than putting them in a rotating drum and asking them to report how ill they feel on a ten-point scale. Another group was given Dramamine, so this is group D, and then they, too, were put in the drum and asked for an illness value. The last group was given a placebo that looks like Dramamine before being put in the drum; this is group P. There were seven subjects in each group. To be clear (and to recap some issues that were covered above or before): we have one nominal IV which took on three values (C, D, or P) and was manipulated between subjects. Paralleling this, in the analysis we ll have one between-subjects factor with three levels. The DV was quantitative and discrete, because the ratings were whole numbers between one and ten. Therefore, the data file will have two columns: one control variable that specifies condition (C, D, or P) and one data variable that contains the illness ratings (1-10). There were seven subjects in each group, so our data file will have 21 rows. The null hypothesis is that the population means for C, D, and P are all the same. This should written as H 0 : μ C = μ D = μ P. The big question is how does a one-way ANOVA test this hypothesis? Before answering this question, try thinking about this one, instead: if you took 21 random and independent samples from a single population (that has non-zero variance), then randomly divided these 21 observations into three groups of seven and calculated the mean for each of the groups, would the three means be exactly the same? If that is too abstract, imagine that you rolled a die 21 times, put the first seven rolls in Group 1, the next seven in Group 2, and the last seven in Group 3. The correct answer (to the question: would the three means be exactly the same? ) is no or, at least, not very often. By random chance, one of the groups will have the highest mean and another will have the lowest. In other words, even if the null hypothesis is exactly true (because the three samples were taken from the same population), we do not expect the three sample means to be the same. We would only expect them to be the same if the samples were very, very large and/or the variance within the population was very, very small. With that in mind, we can now go back and address the question of how one-way ANOVA works. There are, of course, a variety of ways to think about this; the following is my favorite because it parallels how I like to think about t-tests. According to the null hypothesis, the three populations that were being sampled have the same mean. Under all forms of ANOVA, the three populations are assumed to have the same variance and are assumed to be normally distributed. Therefore, according to the null hypothesis, the three populations are exactly the same, because they have the same center, spread, and shape. Because of this, we can pool all of the data to calculate one, common, hypothetical sampling distribution of the mean. In contrast to the independent-sample t-test, where we had the clinical trials version to fall back on, there is no such thing as an equal-variance-not-assumed version of ANOVA. If the equal-variance assumption is violated, then you have to do something to correct the problem or

switch to a different form of analysis. Even more: because SPSS has no clue what to do about a violation of the equal-variance assumption if it happens, it won t even test the assumption unless you ask it to. As always for parametric statistics, the hypothetical sampling distribution (for the mean) is assumed to be normal with a spread that depends on two things: the variance in the sampled population and the size(s) of the sample(s). Back when we were doing t-tests, we talked about the spread of the sampling distribution in terms of its standard deviation, which is called the standard error. (Read that again if this isn t already something that you re comfortable with: the standard deviation of the sampling distribution for the mean is the standard error; the standard error is the standard deviation of the hypothetical sampling distribution of the mean.) The calculation of the standard error for a t-test is simple: it s your best guess about the standard deviation (s) divided by the square-root of the size of the sample. Now that we re doing ANOVA, we need to work in terms of variance, instead of standard deviations (for reasons you ll see soon). So, we now talk about the variance of the sampling distribution for the mean, which is just your best guess about the variance in the population divided by the sample size. Now you ve got everything that you need: a center, a spread (albeit in variance format), and a shape. With this hypothetical sampling distribution in hand, it is relatively easy to calculate the probability of observing three sample means that are as different and extreme (i.e., as far from the overall mean) as the three that we have. If this probability is very small (i.e., less than 5%), then we reject the idea that the three samples came from the same population. In particular, we reject the idea that the population means are the same; we don t reject (or even question) any of the assumptions. This is the same bass-ackward logic that we use for t-tests, complete with the special status for assumptions over null hypotheses. We are not calculating the probability that the null hypothesis is true given the data; we are calculating the probability of getting the data given the null. A second way to think about one-way, between-subjects ANOVA is in terms of a ratio of variances. The story starts out the same as the above, but doesn t use the hypothetical sampling distribution of the mean to calculate the probability of observing the three sample means. Instead, it refers to the spread of the hypothetical distribution as the within-group or unexplained variance. This version also doesn t talk about the three sample means as being different from each other in a pair-wise sense, but simply calculates the variance across these three values and calls this the between-group or explained variance. Then it calculates a ratio by dividing the between-group variance by the within-group variance. This value is compared to a critical value in a table; if the observed ratio is above the critical -- implying that the group means are too variable to be consistent with the idea that they all came from the same population and are only different due to chance -- then the null hypothesis is rejected. Puzzler: assume that you take three samples of 10 each from a single population that has a true variance (σ 2 ) of 420.00. (I.e., I m telling you that the null hypothesis is true; the three samples came from the same distribution.) What do you expect the best-guess variance across the three sample means to be? Note: I m not asking you about the best-guess variance across all of the

data; that s 420.00, because s 2 is an unbiased estimator of σ 2 and we know that σ 2 is 420.00. I m asking you about the variance across the three means. Hey! Did you actually solve the puzzler -- or, at least, spend some time on it -- or did you just keep reading like it was just another paragraph? If you took it seriously and worked on it, then you have my apologies for the interruption (as well as for the unflattering inference behind it); please carry on. If you just breezed on by, however, then please go back and try to solve it. It wasn t a koan (i.e., an unsolvable problem that helps you to achieve enlightenment through some process that I don t understand); it was a real problem that I was hoping that you could solve. Hint: Deep Thought might be helpful. A third way to think about one-way ANOVA is close to the second, but even farther removed from the way that we talk about t-tests. This is the approach from which ANOVA gets its name, because it analyzes (i.e., breaks up) the total variance into various components. We start with a general model that says that all observed values are the sum of several components. Because summing is linear, the model is called the General Linear Model (GLM). In the case of one-way, between-subjects ANOVA, the GLM equation for each observed value is: O ki = F k + S i + ε where O ki is the observed value for subject i who was in condition k; F k is the fixed effect of the condition k, which is a level of the factor; S i is the fixed mean of subject i; and ε is normally-distributed error. Because it isn t possible to separate the effect of the subject from the error (because we only measure each subject once), it is useful to think of the above as: O ki = F k + ( S i + ε ) The version of the GLM equation that I ve given here embodies the claim that the observed value is determined by the mean of the subject plus two additive influences (viz., the factor effect and the random error). Other people prefer to use a slightly different equation which is a little less focused on the subjects -- O ki = M + F k + S i + ε -- which claims that the observed value is determined by some overall mean for all subjects (M), plus additive effects from the factor, the subject, and the error. These two versions are equivalent because ANOVA concerns variance, so whether you have a separate overall mean or put this into the subjects is irrelevant because an additive constant (such as M) has no variance, and adding or subtracting the overall mean from each of the subjects would not have any effect on variance across subjects. Before going on, note or recall the following rule regarding variance values: the variance of the sum (of two or more statistically-independent variables) is equal to the sum of the variances. This is a key to ANOVA, which is why you were probably asked to memorize some version of this statement during undergrad stats; it is why we use variance, instead of standard deviations. Because of the additivity of variance, the GLM equation above implies this: Which can also be written as: σ 2 O = σ 2 F + σ 2 S + σ 2 ε σ 2 O = σ 2 F + σ 2 S+ε

This last equation can be read as: the variance of the observed values equals the variance of the fixed factor effects plus the variance of the sum of the subject means and the error. As mentioned above, in between-subjects ANOVA we use the second version of the variance equation, because we have no way of separating the variance due to subjects from the variance due to error (because we only measure each subject one time). The first computational step to one-way ANOVA calculates the total variance in the sample. This step ignores that there are separate conditions and simply gets an estimate of the variance across all values of the DV. This is σ 2 O. The second step uses the means in each of the conditions to estimate the values of F k. (Note that the F k values are deviations from the overall mean, so they must sum to zero.) The variance across these values is used to estimate σ 2 F. The third step notes that, if σ 2 O = σ 2 F + σ 2 S+ε, then σ 2 S+ε = σ 2 O σ 2 F (by some simple algebra). So we can use the difference between of our estimates of σ 2 O and σ 2 F to estimate σ 2 S+ε. We have now analyzed or partitioned the total variance into two components: one component that is associated with differences between conditions and another that is associated with differences between subjects (within each of the conditions) plus error. These are often referred to as explained and unexplained variance, respectively, on the grounds that the former can be explained in terms of the experimental manipulation that defines the conditions, while the latter cannot be explained. Because σ 2 F is estimated (and should, therefore, probably be written as s 2 F but no-one does that), it has an associated degrees of freedom. Because it was estimated using the k condition means and we always lose one degree of freedom to the overall mean of any set of values (because the mean is needed to calculate variance), it has k 1 degrees of freedom. Because σ 2 F is going to end up in the numerator of something called the F-ratio, k 1 is the numerator degrees of freedom. Likewise, because σ 2 S+ε is estimated (albeit by subtracting two other values), it also has a certain number of degrees of freedom. Because it was estimated using N pieces of data which were divided into k groups, each with their own mean (which each had to be calculated), it has N k degrees of freedom. Finally, because σ 2 S+ε will be in the denominator of the F-ratio, N k is the denominator degrees of freedom. That s enough for now.