1 Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical Quantitative ANOVA Quantitative Quantitative Regression Quantitative Categorical (not discussed) When our data consists of a quantitative response variable and one or more categorical explanatory variables, we can employ a technique called analysis of variance, abbreviated as ANOVA. The material in this chapter corresponds to the first part of Chapter 14 of the textbook. Recall that a categorical explanatory variable is also called a factor. In this chapter, we ll study the simplest form of ANOVA, one-way ANOVA, which uses one factor and one response variable. We ll also study a more complicated setup in the next chapter, two-way ANOVA, which uses two factors instead. (In principle, we could do ANOVA with any number of factors, but in practice, people usually stick to one or two.) 4.1 Basics of One-Way ANOVA Let s start by discussing the way we organize and label the data for oneway ANOVA. We also need to formulate the basic question that we plan to ask.
2 4.1 Basics of One-Way ANOVA 55 Z Setup Typically, when we think about one-way ANOVA, we think about the factor as dividing the subjects into groups. The goal of our analysis is then to compare the means of the subjects in each group. Notation Let g represent the number of groups. Then we ll set things up as follows: ˆ Let µ 1, µ 2,..., µ g represent the true population means of the response variable for the subjects in each group. As usual, these population parameters are what we re really interested in, but we don t know their values. ˆ We call each observation in the sample Y ij, where i is a number from 1 to g that identifies the group number, and j identifies the individual within that group. (For example, Y 12 represents the response variable value of the second individual in the first group.) ˆ We can calculate the sample means for each group, which we ll call Ȳ 1Y, Ȳ2Y,..., ȲgY. We can use these known sample means as estimates of the corresponding unknown population means. Example 4.1: Suppose we want to see if three McDonald s locations around town tend to put the same amount of fries in a medium order, or if some locations put more fries in the container than others. We take the next 30 days on the calendar and randomly assign 10 days to each of the three locations. On each day, we go to the specified location, order a medium order of fries, take it home, and weigh it to see how many ounces of fries it contains. The categorical explanatory variable is just which location we went to, and the quantitative response variable is the number of ounces of fries. For each of the three locations (g 3), the population consists of all medium orders of fries sold at that location, while the sample consists of the orders that we actually got. The population means, which we call µ 1, µ 2, µ 3, represent the average number of ounces of fries in all orders at each location, and these are the quantities we re interested in. We estimate them using Ȳ1Y, Ȳ2Y, Ȳ3Y, the sample means for each location, which are collected from the data for our orders. The data is shown in Figure 4.1. n
3 4.1 Basics of One-Way ANOVA 56 Location Fries (ounces) Mean Std. Dev Figure 4.1: Ounces of fries in 10 medium orders of fries at each of three McDonald s locations Question of Interest What we really want to know is whether all of the groups have the same population mean, that is, whether µ 1, µ 2,..., µ g are all the same. This is equivalent to asking whether or not the response variable depends on the factor. Intuitively speaking, the most obvious way to answer this question is by looking at Ȳ1Y, Ȳ2Y,..., ȲgY, the sample means of the various groups. If they are close enough to each other, in some sense, then we re willing to believe that all the true population means µ 1, µ 2,..., µ g are the same. If one or more of Ȳ1Y, Ȳ2Y,..., ȲgY are too far from the others, then that convinces us that the true population means must not all be the same. All that remains is to figure out what we mean by close enough and too far. We ll eventually see how to do this with a hypothesis test. Z One-Way ANOVA Table ANOVA gets its name (analysis of variance) from the fact that it examines different kinds of variability in the data. It then uses this information to construct a hypothesis test. To describe these different kinds of variability, we ll first need to introduce some more notation:
4 4.1 Basics of One-Way ANOVA 57 ˆ ȲYY represents the overall sample mean of all the data from all groups combined. ˆ N is the total number of observations, and n i is the number of observations in the ith group. (So n 1 n 2 n g N.) Sums of Squares The most basic quantities that ANOVA uses to describe different kinds of variability are the sums of squares, abbreviated SS. One-way ANOVA involves three sums of squares: ˆ The total sum of squares, SS Tot, measures the overall variability in the data by looking at how the Y ij values vary around ȲYY, their overall mean. Its formula is SS Tot g n i Q i 1 Q» 1 It can be seen from the formula that SS Tot~ˆN is what we would get if we lumped all N observations together, ignoring groups, and calculated the sample standard deviation. j 1 Y ij ȲYYŽ2. ˆ The group sum of squares, SS G, measures the variability between the groups by looking at how the sample means for each group, ȲiY, vary around ȲYY, the overall mean. Its formula is SS G g Q i 1 n i Ȳ iy ȲYYŽ2. ˆ The error sum of squares, SS E, measures the variability within the groups by looking at how each Y ij value varies around ȲiY, the sample mean for its group. Its formula is SS E g n i Q i 1 Q j 1 Y ij ȲiYŽ2. If we call the sample standard deviation within each group s i, then another formula for SS E is SS E g Q i 1ˆn i 1 s 2 i.
5 4.1 Basics of One-Way ANOVA 58 It turns out to be true that SS Tot SS G SS E. In words, the total variability equals the sum of the variability between groups and the variability within groups. Degrees of Freedom The sums of squares are supposed to measure different kinds of variability in the data, but they also tend to be influenced in various ways by the number of groups g and the number of observations N. This influence is measured by quantities called degrees of freedom that are associated with each sum of squares. Their formulas are df Tot N 1, df G g 1, df E N g. Notice that df Tot df G df E. The group and error degrees of freedom add to the total, just like the sums of squares do. Mean Squares The mean squares are just the sums of squares divided by their degrees of freedom: MS G SS G df G, MS E SS E df E. (We seldom bother calculating MS Tot, because it s just the square of the sample standard deviation of all N observations lumped together.) MS G and MS E measure the variability between groups and within groups in a way that properly accounts for g and N, unlike SS G and SS E. Table We typically summarize all this information in an ANOVA table. An ANOVA table for one-way ANOVA is laid out as shown in Figure 4.2. (A few other quantities that we ll calculate later are also sometimes included as extra columns on the right side of the ANOVA table.) Example 4.2: The ANOVA table for the data shown in Figure 4.1 would obviously be very tedious to calculate by hand, so we use computer software to calculate the ANOVA table shown in Figure 4.3. n
6 4.2 One-Way ANOVA F Test 59 Source df SS MS Group df G SS G MS G Error df E SS E MS E Total df Tot SS Tot Figure 4.2: Generic one-way ANOVA table. Source df SS MS Group Error Total Figure 4.3: ANOVA table for the data in Figure One-Way ANOVA F Test The focus of ANOVA is a hypothesis test for checking whether all the groups have the same population mean. This is the same as testing whether the response variable depends on the factor. Sometimes we ll refer to this as a test for whether the factor has an effect on the response variable (although it may not be right to think about this as a literal cause-andeffect relationship). Z One-Way ANOVA F Test Procedure Like any other hypothesis test, the one-way ANOVA F test consists of the standard five steps. Assumptions The one-way ANOVA F test makes four assumptions: ˆ The data comes from a random sample or randomized experiment. In an observational study, the subjects in each group should be a random sample from that group. In an experiment, the subjects should be randomly assigned to the groups.
7 4.2 One-Way ANOVA F Test 60 ˆ The data for each group should be independent. For example, we wouldn t want to reuse the same subject for measurements in more than one group. ˆ For each group, the population distribution of the response variable has a normal distribution. To check this assumption, there a couple of things we should look for: The shape of the data should look at least sort of close to normal. There should be no outliers. ˆ The population distribution of the response variable has the same standard deviation σ for each group. Of course, we don t know σ, but we can still check this assumption by comparing the sample standard deviations for each group. As an approximate rule of thumb, we typically don t worry unless one group s standard deviation is more than twice as big as another s. Note: The textbook organizes these four assumptions a little differently. It combines my first two and my last two, and so it lists only two assumptions. Hypotheses The null hypothesis for the one-way ANOVA F test is that the factor has no effect, and the alternative is that it does. In terms of parameters, we can write these hypotheses as follows: H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. Test Statistic If we re testing whether or not µ 1, µ 2,..., µ g are all equal, then it seems reasonable to look at our estimates of those quantities and see if those are all close enough to each other. So we want to look at whether Ȳ 1Y, Ȳ2Y,..., ȲgY are all close enough to each other. We measure the closeness of the group means using MS G, the variability between groups. But there s something else we need to consider as
8 4.2 One-Way ANOVA F Test 61 State GA AL FL $2.06 $2.15 $2.25 Data $2.05 $2.16 $2.24 $2.04 $2.15 $2.26 $2.05 $2.14 $2.25 Mean $2.05 $2.15 $2.25 State GA AL FL $2.37 $2.42 $2.07 Data $1.73 $2.02 $1.83 $1.97 $2.18 $2.47 $2.13 $1.78 $2.23 Mean $2.05 $2.15 $2.25 Figure 4.4: Two hypothetical data sets for a study of gas prices. well. Look at the data in Figure 4.4, which shows some hypothetical data comparing gas prices from three different states. Notice that the sample mean for each group (state) is the same for both data sets, so the variability between groups, MS G, is the same as well. However, common sense says that the data set on the left is much more convincing that there is an actual difference from group to group. Mathematically, this is because the data set on the left has less variability within groups, which we measure with MS E. Our test statistic compares the variability between groups to the variability within groups by taking a ratio: F MS G MS E. When MS G is large compared to MS E, like the hypothetical data set on the left, F will be large. So larger F values represent more evidence that there is a difference between the group population means in other words, more evidence against H 0 and in favor of H a. P-Value and the F Distribution Recall the definition of the p-value: The p-value is the probability of getting a test statistic value at least as extreme as the one observed, if H 0 is true. Typically the p-value is a tail probability from whatever kind of statistical distribution the test statistic has when H 0 is true. For the one-way ANOVA F test statistic, we call this distribution an F distribution, like the ones shown in Figure 4.5.
9 4.2 One-Way ANOVA F Test Value of F Value of F Figure 4.5: Density of the F distribution for df 1 2, df 2 27 (left) and df 1 3, df 2 40 (right). An F distribution has the following properties: ˆ It is skewed right. ˆ Things with an F distribution can t be negative, so the F distribution has only one tail. (We never need to double any tail probabilities from an F distribution.) ˆ The center of the F distribution is usually somewhere around 1, or a little less. ˆ The exact shape of the F distribution is determined by two different degrees of freedom the numerator degrees of freedom, or df 1, and the denominator degrees of freedom, or df 2. If H 0 is true, our test statistic, F, has an F distribution with df 1 df G and df 2 df E. This is easy to remember, since the formula for F is F MS G MS E, and the numerator and denominator degrees of freedom are just the degrees of freedom associated with the quantities in the numerator and denominator of F. Remember that we said the larger values of F are the values that are more supportive of H a. So the p-value is the probability of getting an
10 4.2 One-Way ANOVA F Test 63 F value larger than the one we actually got, if H 0 is true. Since the test statistic F has an F distribution if H 0 is true, this probability is represented by the shaded area in Figure 4.6. To calculate this probability exactly, we typically need statistical software Value of F Figure 4.6: Tail probability of an F distribution with df 1 3, df If we don t have access to statistical software, we often have to use an F table like the one in the back of our textbook to try to figure out the p-value. Ideally, we would go to our F table, find the correct df 1 and df 2, look up our F value, and it would tell us the p-value. Unfortunately, that s way too much information and would require our F table to be dozens of pages long. Instead, a typical F table, like the one in Figure 4.7, works a little differently. For each combination of df 1 and df 2, the table tells us only a single number. That number is the F value corresponding to a p-value of We then check whether our observed F test statistic value is larger or smaller than the one listed in the table. ˆ If our test statistic value is larger than the number in the table, then our p-value is smaller than ˆ If our test statistic value is smaller than the number in the table, then our p-value is larger than We can see that the p-value behaves as it should: Smaller p-values correspond to larger F values, and both correspond to more evidence against
11 4.2 One-Way ANOVA F Test 64 df 1 df Figure 4.7: Top-left corner of an F table for right-tail probabilities of H 0 and in support of H a. Decision We make a decision the same way we always do for any hypothesis test: by rejecting H 0 if the p-value is less than or equal to α (often 0.05), and failing to reject H 0 if the p-value is greater than α. Remember that the hypotheses we re testing are H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. So let s think about what our decision really represents. ˆ If we reject H 0, then we re concluding that at least some of the group population means are different. ˆ If we fail to reject H 0, then we re concluding that it s reasonable that all the group population means are the same. Example 4.3: Let s go through the five steps of the one-way ANOVA F test for the data in Example 4.2 using α Let s check each of the four assumptions. ˆ Each day was randomly assigned to a particular location, so this is a randomized experiment. ˆ The different groups correspond to different locations, each of which should have no ability to affect the measurements of the other two, so the groups are independent.
12 4.2 One-Way ANOVA F Test 65 ˆ It s very hard to tell much about the shape of the data with only 10 observations in each group, but quick dotplots for each group show shapes that are at least somewhat consistent with a normal distribution. Also, we see no outliers in any of the groups. ˆ The sample standard deviations for the three groups are at least sort of close to each other, so we don t see any violation of the constant standard deviation assumption. Our assumptions are okay, so we can proceed. 2. The null hypothesis is that µ 1 µ 2 µ 3, which means that the three locations, on average, give out the same amount of fries. The alternative hypothesis is that at least one of µ 1, µ 2, µ 3 is not equal to the others, which means that at least one of the locations gives out more or fewer fries than the others. 3. The test statistic F is calculated from the mean squares in the ANOVA table shown in Figure 4.3: F MS G MS E To calculate our p-value, we compare our observed test statistic value to an F distribution with df 1 df G 2 and df 2 df E 27. When we consult the F table for these df values, the number that it gives us is This means that for an F distribution with these degrees of freedom, a test statistic value of 3.35 would correspond to a p-value of Our observed test statistic value of 3.55 is larger than 3.35, so our p-value is smaller than (We could use statistical software to calculate the exact p-value, which turns out to be ) 5. Our p-value is smaller than our α, so we reject H 0. We can conclude that the three locations do not give out the same amount of fries. However, we can t conclude anything about which locations give out more or less fries than the others, or about how many more or less they give out. n Figure 4.8 may be helpful for remembering various results and interpretations of a one-way ANOVA F test.
13 4.2 One-Way ANOVA F Test 66 Z Large F value Small F value (much larger than 1) (around 1 or less than 1) Small p-value Large p-value Evidence against H 0 (for H a ) No evidence against H 0 (for H a ) Reject H 0 Fail to reject H 0 Conclude that some population Reasonable that all population group means differ group means are the same Figure 4.8: Results and interpretations of a one-way ANOVA F test Alternatives to the One-Way ANOVA F Test There are some situations in which one-way ANOVA could be used, but another test procedure might be equivalent or preferable. One-Way ANOVA with Two Groups When we have only two groups, then the one-way ANOVA F test serves exactly the same purpose as the two-sided two-sample t test from Section 10.2, which you saw in your previous course. It turns out that oneway ANOVA with only two groups is completely equivalent to the two-sided two-sample t test, in the sense that both tests will give exactly the same p-value. (This happens because their test statistics are related: F t 2.) So in this case, it makes no difference which procedure is used, since both will yield exactly the same conclusion. However, the two-sample t test is slightly more flexible in this case since it also allows us to use a one-sided alternative hypothesis if we so desire. Ordinal Variables If the factor is an ordinal variable, one-way ANOVA makes no use of the ordering information. There exist other test procedures that might make slightly fewer type II errors than one-way ANOVA by taking into account
14 4.2 One-Way ANOVA F Test 67 the order of the factor categories, but we won t discuss these procedures here. Normality One-way ANOVA assumes that the data in each group comes from a normal distribution. Even if the distribution is somewhat different from normal, one-way ANOVA can still work okay if the sample sizes are large enough. However, when sample sizes are small, one-way ANOVA can be unreliable if the data in one or more of the groups comes from a highly non-normal distribution. There exists a nonparametric equivalent of the one-way ANOVA F test called the Kruskal-Wallis test that uses only the ranks of the data and is okay to use no matter what distribution the data comes from. We won t discuss the details, but Section 15.2 of the textbook gives a brief outline. Block Designs Recall from Stats 1 that when we wanted to compare the means of two groups, there were two different procedures: ˆ The two-sample t test compared groups when the data in one group was independent from the data in the other group. ˆ The matched-pairs t test compared groups when each observation in one group was paired with a corresponding observation in the other group (such as husbands and wives, or before and after measurements). The one-way ANOVA F test we discussed in this section is the multiplegroup analog of the two-sample t test. (That s why they re equivalent when there are only two groups.) As mentioned in the assumptions, it can t be used when the observations in a group correspond to observations in other groups. There also exists a procedure called a block design that is the multiplegroup analog of the matched-pairs t test. It should be used instead of simple one-way ANOVA when each subject is re-used for measurements in each group. There are many cases where such a procedure is useful.
15 4.3 One-Way ANOVA Confidence Intervals 68 Example 4.4: Suppose we want to compare the effectiveness of three kinds of fertilizer for growing corn. We have five plots of land available to use, so we divide each plot into thirds and use one fertilizer on each third. Here the plots of land are the subjects and the fertilizers are the groups. Each subject is being reused for each group, so we can t use the one-way ANOVA procedure we discussed in this section. However, this type of data can be analyzed using a block design. n Unfortunately, we won t have time to discuss block designs in detail in this course. The textbook doesn t discuss them either, so if for some reason you need to learn about them, consult another textbook instead. (I can give you a reference if you re interested.) 4.3 One-Way ANOVA Confidence Intervals The one-way ANOVA F test allows us to conclude whether or not the population group means are all equal. However, we might also want to say something about what we think the group means actually are, or about which group means are different and by how much. We can answer these questions by constructing confidence intervals. Since there are multiple quantities for which we might want to construct confidence intervals in a one-way ANOVA setup, we need to discuss the right way to do this. Z Simultaneous Confidence Intervals When we construct more than one confidence interval at a time, we have to be careful to maintain our specified overall confidence level. For example, if we re 95% confident in the statement µ 1 is between 78 and 86, and we re also 95% confident in the statement µ 2 is between 31 and 39, then we ll (usually) be less than 95% confident in the combined statement µ 1 is between 78 and 86 and µ 2 is between 31 and 39. When we want to state a certain overall confidence level for several confidence intervals simultaneously, we need to construct simultaneous confidence intervals. (If we re only interested in setting the confidence level for one confidence interval at a time, then we might call this an individual confidence level, to distinguish it from an overall simultaneous confidence level.)
16 4.3 One-Way ANOVA Confidence Intervals 69 Multiple Comparison Methods To construct simultaneous confidence intervals, we have to use something called a multiple comparison method. There are a variety of multiple comparison methods, and the best one to use depends on what kind of confidence intervals we plan to construct. We won t discuss the details here. Z Confidence Intervals for Group Means The most obvious quantities for which we might want to construct confidence intervals are µ 1,..., µ g, the population means of the groups. Since we re constructing multiple confidence intervals at once, we ll need to use a multiple comparison procedure. Many different multiple comparison methods exist for this situation, and one of the most commonly used is the Bonferroni method. We ll refer to the intervals it produces as Bonferroni simultaneous confidence intervals. Assumptions The assumptions for constructing confidence intervals for group means are the same as those for the one-way ANOVA F test. Estimating the Standard Deviation Recall º that one of our assumptions is that each group has the same population standard deviation, which we call σ. We can estimate σ using ˆσ MS E. This quantity will show up in the confidence interval formula, but it might also be useful in its own right. Example 4.5: In Example 4.2, we calculated MS E Hence our estimate for the population standard deviation σ of each group is ˆσ n Formula To construct a set of Bonferroni simultaneous confidence intervals µ 1,..., µ g, we can use the following formula for each µ i : CI for µ i Ȳ iy t ˆσ¾ 1 n i,
17 4.3 One-Way ANOVA Confidence Intervals 70 where t is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get t in this chapter, but we may come back to it later (the Bonferroni method will come up again in a later chapter). Example 4.6: For Example 4.2, simultaneous 95% Bonferroni confidence intervals for the three group means, as calculated by statistical software, are as follows: µ 1 ˆ3.85, 4.87 µ 2 ˆ3.58, 4.60 µ 3 ˆ3.10, 4.12 Since this is a set of simultaneous confidence intervals, we can say that we re 95% confident that all three parameter values are in their corresponding intervals. n Z Confidence Intervals for Differences of Group Means The one-way ANOVA F test only tells us whether there are differences between the groups. It does not give a verdict on which groups are different, or by how much. To figure this out, we can construct confidence intervals to compare each pair of group population means. More specifically, we want to construct simultaneous confidence intervals for µ i µ k for each pair of groups k. For example, with three groups, there would be three quantities for which we would want to construct confidence intervals: µ 1 µ 2, µ 1 µ 3, and µ 2 µ 3. Many different multiple comparison methods exist for this situation, but the best one for our purposes is called the Tukey method. We ll refer to the intervals it produces as Tukey simultaneous confidence intervals. Assumptions The assumptions for constructing Tukey simultaneous confidence intervals are exactly the same as those for the one-way ANOVA F test, with one additional requirement: the group sample sizes n 1, n 2,..., n g should be at least approximately equal.
18 4.3 One-Way ANOVA Confidence Intervals 71 Formula To construct a set of Tukey simultaneous confidence intervals for each pair of groups i and k, we can use the following formula for each k: CI for µ i µ k Ȳ iy ȲkYŽq ˆσ¾ 1 n i 1 n k, where q is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get q, since we would typically use statistical software to calculate it for us. Interpretation For each comparison of two groups, we interpret the corresponding Tukey simultaneous confidence interval as follows: ˆ If the interval contains only positive numbers, then we can conclude that the first of the two population means being compared is bigger than the second. ˆ If the interval contains only negative numbers, then we can conclude that the first of the two population means being compared is smaller than the second. ˆ If the interval contains both positive and negative numbers (in other words, if it contains zero), then we can t conclude that either of the two population means being compared is bigger than the other. Of course, whenever we conclude that one population mean is bigger than another, the interval also gives us an idea of how much bigger. Example 4.7: For Example 4.2, Tukey simultaneous 95% confidence intervals, as calculated by statistical software, are as follows: µ 1 µ 2 ˆ0.44, 0.98 µ 1 µ 3 ˆ0.04, 1.46 µ 2 µ 3 ˆ0.23, 1.19 So we can t conclude that there s any difference between µ 1 and µ 2 or between µ 2 and µ 3, since both of the corresponding intervals contain both positive and negative numbers. However, we can conclude that µ 1 is bigger than µ 3, since the corresponding interval contains only positive numbers.
19 4.3 One-Way ANOVA Confidence Intervals 72 In other words, we can conclude that Location 1 gives out more fries than Location 3, but we can t conclude anything about how Location 2 compares to either of them. n
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. email@example.com www.excelmasterseries.com
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
STAT 145 (Notes) Al Nosedal firstname.lastname@example.org Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
ANOVA ANOVA Analysis of Variance Chapter 6 A procedure for comparing more than two groups independent variable: smoking status non-smoking one pack a day > two packs a day dependent variable: number of
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In : %%R
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather
Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: email@example.com 1. Introduction
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random
Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual
EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
Using Excel s Analysis ToolPak Add-In S. Christian Albright, September 2013 Introduction This document illustrates the use of Excel s Analysis ToolPak add-in for data analysis. The document is aimed at
Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate
1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics firstname.lastname@example.org http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting