# Recall this chart that showed how most of our course would be organized:

Size: px
Start display at page:

Download "Recall this chart that showed how most of our course would be organized:"

Transcription

1 Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical Quantitative ANOVA Quantitative Quantitative Regression Quantitative Categorical (not discussed) When our data consists of a quantitative response variable and one or more categorical explanatory variables, we can employ a technique called analysis of variance, abbreviated as ANOVA. The material in this chapter corresponds to the first part of Chapter 14 of the textbook. Recall that a categorical explanatory variable is also called a factor. In this chapter, we ll study the simplest form of ANOVA, one-way ANOVA, which uses one factor and one response variable. We ll also study a more complicated setup in the next chapter, two-way ANOVA, which uses two factors instead. (In principle, we could do ANOVA with any number of factors, but in practice, people usually stick to one or two.) 4.1 Basics of One-Way ANOVA Let s start by discussing the way we organize and label the data for oneway ANOVA. We also need to formulate the basic question that we plan to ask.

2 4.1 Basics of One-Way ANOVA 55 Z Setup Typically, when we think about one-way ANOVA, we think about the factor as dividing the subjects into groups. The goal of our analysis is then to compare the means of the subjects in each group. Notation Let g represent the number of groups. Then we ll set things up as follows: ˆ Let µ 1, µ 2,..., µ g represent the true population means of the response variable for the subjects in each group. As usual, these population parameters are what we re really interested in, but we don t know their values. ˆ We call each observation in the sample Y ij, where i is a number from 1 to g that identifies the group number, and j identifies the individual within that group. (For example, Y 12 represents the response variable value of the second individual in the first group.) ˆ We can calculate the sample means for each group, which we ll call Ȳ 1Y, Ȳ2Y,..., ȲgY. We can use these known sample means as estimates of the corresponding unknown population means. Example 4.1: Suppose we want to see if three McDonald s locations around town tend to put the same amount of fries in a medium order, or if some locations put more fries in the container than others. We take the next 30 days on the calendar and randomly assign 10 days to each of the three locations. On each day, we go to the specified location, order a medium order of fries, take it home, and weigh it to see how many ounces of fries it contains. The categorical explanatory variable is just which location we went to, and the quantitative response variable is the number of ounces of fries. For each of the three locations (g 3), the population consists of all medium orders of fries sold at that location, while the sample consists of the orders that we actually got. The population means, which we call µ 1, µ 2, µ 3, represent the average number of ounces of fries in all orders at each location, and these are the quantities we re interested in. We estimate them using Ȳ1Y, Ȳ2Y, Ȳ3Y, the sample means for each location, which are collected from the data for our orders. The data is shown in Figure 4.1. n

3 4.1 Basics of One-Way ANOVA 56 Location Fries (ounces) Mean Std. Dev Figure 4.1: Ounces of fries in 10 medium orders of fries at each of three McDonald s locations Question of Interest What we really want to know is whether all of the groups have the same population mean, that is, whether µ 1, µ 2,..., µ g are all the same. This is equivalent to asking whether or not the response variable depends on the factor. Intuitively speaking, the most obvious way to answer this question is by looking at Ȳ1Y, Ȳ2Y,..., ȲgY, the sample means of the various groups. If they are close enough to each other, in some sense, then we re willing to believe that all the true population means µ 1, µ 2,..., µ g are the same. If one or more of Ȳ1Y, Ȳ2Y,..., ȲgY are too far from the others, then that convinces us that the true population means must not all be the same. All that remains is to figure out what we mean by close enough and too far. We ll eventually see how to do this with a hypothesis test. Z One-Way ANOVA Table ANOVA gets its name (analysis of variance) from the fact that it examines different kinds of variability in the data. It then uses this information to construct a hypothesis test. To describe these different kinds of variability, we ll first need to introduce some more notation:

4 4.1 Basics of One-Way ANOVA 57 ˆ ȲYY represents the overall sample mean of all the data from all groups combined. ˆ N is the total number of observations, and n i is the number of observations in the ith group. (So n 1 n 2 n g N.) Sums of Squares The most basic quantities that ANOVA uses to describe different kinds of variability are the sums of squares, abbreviated SS. One-way ANOVA involves three sums of squares: ˆ The total sum of squares, SS Tot, measures the overall variability in the data by looking at how the Y ij values vary around ȲYY, their overall mean. Its formula is SS Tot g n i Q i 1 Q» 1 It can be seen from the formula that SS Tot~ˆN is what we would get if we lumped all N observations together, ignoring groups, and calculated the sample standard deviation. j 1 Y ij ȲYYŽ2. ˆ The group sum of squares, SS G, measures the variability between the groups by looking at how the sample means for each group, ȲiY, vary around ȲYY, the overall mean. Its formula is SS G g Q i 1 n i Ȳ iy ȲYYŽ2. ˆ The error sum of squares, SS E, measures the variability within the groups by looking at how each Y ij value varies around ȲiY, the sample mean for its group. Its formula is SS E g n i Q i 1 Q j 1 Y ij ȲiYŽ2. If we call the sample standard deviation within each group s i, then another formula for SS E is SS E g Q i 1ˆn i 1 s 2 i.

5 4.1 Basics of One-Way ANOVA 58 It turns out to be true that SS Tot SS G SS E. In words, the total variability equals the sum of the variability between groups and the variability within groups. Degrees of Freedom The sums of squares are supposed to measure different kinds of variability in the data, but they also tend to be influenced in various ways by the number of groups g and the number of observations N. This influence is measured by quantities called degrees of freedom that are associated with each sum of squares. Their formulas are df Tot N 1, df G g 1, df E N g. Notice that df Tot df G df E. The group and error degrees of freedom add to the total, just like the sums of squares do. Mean Squares The mean squares are just the sums of squares divided by their degrees of freedom: MS G SS G df G, MS E SS E df E. (We seldom bother calculating MS Tot, because it s just the square of the sample standard deviation of all N observations lumped together.) MS G and MS E measure the variability between groups and within groups in a way that properly accounts for g and N, unlike SS G and SS E. Table We typically summarize all this information in an ANOVA table. An ANOVA table for one-way ANOVA is laid out as shown in Figure 4.2. (A few other quantities that we ll calculate later are also sometimes included as extra columns on the right side of the ANOVA table.) Example 4.2: The ANOVA table for the data shown in Figure 4.1 would obviously be very tedious to calculate by hand, so we use computer software to calculate the ANOVA table shown in Figure 4.3. n

6 4.2 One-Way ANOVA F Test 59 Source df SS MS Group df G SS G MS G Error df E SS E MS E Total df Tot SS Tot Figure 4.2: Generic one-way ANOVA table. Source df SS MS Group Error Total Figure 4.3: ANOVA table for the data in Figure One-Way ANOVA F Test The focus of ANOVA is a hypothesis test for checking whether all the groups have the same population mean. This is the same as testing whether the response variable depends on the factor. Sometimes we ll refer to this as a test for whether the factor has an effect on the response variable (although it may not be right to think about this as a literal cause-andeffect relationship). Z One-Way ANOVA F Test Procedure Like any other hypothesis test, the one-way ANOVA F test consists of the standard five steps. Assumptions The one-way ANOVA F test makes four assumptions: ˆ The data comes from a random sample or randomized experiment. In an observational study, the subjects in each group should be a random sample from that group. In an experiment, the subjects should be randomly assigned to the groups.

7 4.2 One-Way ANOVA F Test 60 ˆ The data for each group should be independent. For example, we wouldn t want to reuse the same subject for measurements in more than one group. ˆ For each group, the population distribution of the response variable has a normal distribution. To check this assumption, there a couple of things we should look for: The shape of the data should look at least sort of close to normal. There should be no outliers. ˆ The population distribution of the response variable has the same standard deviation σ for each group. Of course, we don t know σ, but we can still check this assumption by comparing the sample standard deviations for each group. As an approximate rule of thumb, we typically don t worry unless one group s standard deviation is more than twice as big as another s. Note: The textbook organizes these four assumptions a little differently. It combines my first two and my last two, and so it lists only two assumptions. Hypotheses The null hypothesis for the one-way ANOVA F test is that the factor has no effect, and the alternative is that it does. In terms of parameters, we can write these hypotheses as follows: H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. Test Statistic If we re testing whether or not µ 1, µ 2,..., µ g are all equal, then it seems reasonable to look at our estimates of those quantities and see if those are all close enough to each other. So we want to look at whether Ȳ 1Y, Ȳ2Y,..., ȲgY are all close enough to each other. We measure the closeness of the group means using MS G, the variability between groups. But there s something else we need to consider as

8 4.2 One-Way ANOVA F Test 61 State GA AL FL \$2.06 \$2.15 \$2.25 Data \$2.05 \$2.16 \$2.24 \$2.04 \$2.15 \$2.26 \$2.05 \$2.14 \$2.25 Mean \$2.05 \$2.15 \$2.25 State GA AL FL \$2.37 \$2.42 \$2.07 Data \$1.73 \$2.02 \$1.83 \$1.97 \$2.18 \$2.47 \$2.13 \$1.78 \$2.23 Mean \$2.05 \$2.15 \$2.25 Figure 4.4: Two hypothetical data sets for a study of gas prices. well. Look at the data in Figure 4.4, which shows some hypothetical data comparing gas prices from three different states. Notice that the sample mean for each group (state) is the same for both data sets, so the variability between groups, MS G, is the same as well. However, common sense says that the data set on the left is much more convincing that there is an actual difference from group to group. Mathematically, this is because the data set on the left has less variability within groups, which we measure with MS E. Our test statistic compares the variability between groups to the variability within groups by taking a ratio: F MS G MS E. When MS G is large compared to MS E, like the hypothetical data set on the left, F will be large. So larger F values represent more evidence that there is a difference between the group population means in other words, more evidence against H 0 and in favor of H a. P-Value and the F Distribution Recall the definition of the p-value: The p-value is the probability of getting a test statistic value at least as extreme as the one observed, if H 0 is true. Typically the p-value is a tail probability from whatever kind of statistical distribution the test statistic has when H 0 is true. For the one-way ANOVA F test statistic, we call this distribution an F distribution, like the ones shown in Figure 4.5.

9 4.2 One-Way ANOVA F Test Value of F Value of F Figure 4.5: Density of the F distribution for df 1 2, df 2 27 (left) and df 1 3, df 2 40 (right). An F distribution has the following properties: ˆ It is skewed right. ˆ Things with an F distribution can t be negative, so the F distribution has only one tail. (We never need to double any tail probabilities from an F distribution.) ˆ The center of the F distribution is usually somewhere around 1, or a little less. ˆ The exact shape of the F distribution is determined by two different degrees of freedom the numerator degrees of freedom, or df 1, and the denominator degrees of freedom, or df 2. If H 0 is true, our test statistic, F, has an F distribution with df 1 df G and df 2 df E. This is easy to remember, since the formula for F is F MS G MS E, and the numerator and denominator degrees of freedom are just the degrees of freedom associated with the quantities in the numerator and denominator of F. Remember that we said the larger values of F are the values that are more supportive of H a. So the p-value is the probability of getting an

10 4.2 One-Way ANOVA F Test 63 F value larger than the one we actually got, if H 0 is true. Since the test statistic F has an F distribution if H 0 is true, this probability is represented by the shaded area in Figure 4.6. To calculate this probability exactly, we typically need statistical software Value of F Figure 4.6: Tail probability of an F distribution with df 1 3, df If we don t have access to statistical software, we often have to use an F table like the one in the back of our textbook to try to figure out the p-value. Ideally, we would go to our F table, find the correct df 1 and df 2, look up our F value, and it would tell us the p-value. Unfortunately, that s way too much information and would require our F table to be dozens of pages long. Instead, a typical F table, like the one in Figure 4.7, works a little differently. For each combination of df 1 and df 2, the table tells us only a single number. That number is the F value corresponding to a p-value of We then check whether our observed F test statistic value is larger or smaller than the one listed in the table. ˆ If our test statistic value is larger than the number in the table, then our p-value is smaller than ˆ If our test statistic value is smaller than the number in the table, then our p-value is larger than We can see that the p-value behaves as it should: Smaller p-values correspond to larger F values, and both correspond to more evidence against

11 4.2 One-Way ANOVA F Test 64 df 1 df Figure 4.7: Top-left corner of an F table for right-tail probabilities of H 0 and in support of H a. Decision We make a decision the same way we always do for any hypothesis test: by rejecting H 0 if the p-value is less than or equal to α (often 0.05), and failing to reject H 0 if the p-value is greater than α. Remember that the hypotheses we re testing are H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. So let s think about what our decision really represents. ˆ If we reject H 0, then we re concluding that at least some of the group population means are different. ˆ If we fail to reject H 0, then we re concluding that it s reasonable that all the group population means are the same. Example 4.3: Let s go through the five steps of the one-way ANOVA F test for the data in Example 4.2 using α Let s check each of the four assumptions. ˆ Each day was randomly assigned to a particular location, so this is a randomized experiment. ˆ The different groups correspond to different locations, each of which should have no ability to affect the measurements of the other two, so the groups are independent.

12 4.2 One-Way ANOVA F Test 65 ˆ It s very hard to tell much about the shape of the data with only 10 observations in each group, but quick dotplots for each group show shapes that are at least somewhat consistent with a normal distribution. Also, we see no outliers in any of the groups. ˆ The sample standard deviations for the three groups are at least sort of close to each other, so we don t see any violation of the constant standard deviation assumption. Our assumptions are okay, so we can proceed. 2. The null hypothesis is that µ 1 µ 2 µ 3, which means that the three locations, on average, give out the same amount of fries. The alternative hypothesis is that at least one of µ 1, µ 2, µ 3 is not equal to the others, which means that at least one of the locations gives out more or fewer fries than the others. 3. The test statistic F is calculated from the mean squares in the ANOVA table shown in Figure 4.3: F MS G MS E To calculate our p-value, we compare our observed test statistic value to an F distribution with df 1 df G 2 and df 2 df E 27. When we consult the F table for these df values, the number that it gives us is This means that for an F distribution with these degrees of freedom, a test statistic value of 3.35 would correspond to a p-value of Our observed test statistic value of 3.55 is larger than 3.35, so our p-value is smaller than (We could use statistical software to calculate the exact p-value, which turns out to be ) 5. Our p-value is smaller than our α, so we reject H 0. We can conclude that the three locations do not give out the same amount of fries. However, we can t conclude anything about which locations give out more or less fries than the others, or about how many more or less they give out. n Figure 4.8 may be helpful for remembering various results and interpretations of a one-way ANOVA F test.

13 4.2 One-Way ANOVA F Test 66 Z Large F value Small F value (much larger than 1) (around 1 or less than 1) Small p-value Large p-value Evidence against H 0 (for H a ) No evidence against H 0 (for H a ) Reject H 0 Fail to reject H 0 Conclude that some population Reasonable that all population group means differ group means are the same Figure 4.8: Results and interpretations of a one-way ANOVA F test Alternatives to the One-Way ANOVA F Test There are some situations in which one-way ANOVA could be used, but another test procedure might be equivalent or preferable. One-Way ANOVA with Two Groups When we have only two groups, then the one-way ANOVA F test serves exactly the same purpose as the two-sided two-sample t test from Section 10.2, which you saw in your previous course. It turns out that oneway ANOVA with only two groups is completely equivalent to the two-sided two-sample t test, in the sense that both tests will give exactly the same p-value. (This happens because their test statistics are related: F t 2.) So in this case, it makes no difference which procedure is used, since both will yield exactly the same conclusion. However, the two-sample t test is slightly more flexible in this case since it also allows us to use a one-sided alternative hypothesis if we so desire. Ordinal Variables If the factor is an ordinal variable, one-way ANOVA makes no use of the ordering information. There exist other test procedures that might make slightly fewer type II errors than one-way ANOVA by taking into account

14 4.2 One-Way ANOVA F Test 67 the order of the factor categories, but we won t discuss these procedures here. Normality One-way ANOVA assumes that the data in each group comes from a normal distribution. Even if the distribution is somewhat different from normal, one-way ANOVA can still work okay if the sample sizes are large enough. However, when sample sizes are small, one-way ANOVA can be unreliable if the data in one or more of the groups comes from a highly non-normal distribution. There exists a nonparametric equivalent of the one-way ANOVA F test called the Kruskal-Wallis test that uses only the ranks of the data and is okay to use no matter what distribution the data comes from. We won t discuss the details, but Section 15.2 of the textbook gives a brief outline. Block Designs Recall from Stats 1 that when we wanted to compare the means of two groups, there were two different procedures: ˆ The two-sample t test compared groups when the data in one group was independent from the data in the other group. ˆ The matched-pairs t test compared groups when each observation in one group was paired with a corresponding observation in the other group (such as husbands and wives, or before and after measurements). The one-way ANOVA F test we discussed in this section is the multiplegroup analog of the two-sample t test. (That s why they re equivalent when there are only two groups.) As mentioned in the assumptions, it can t be used when the observations in a group correspond to observations in other groups. There also exists a procedure called a block design that is the multiplegroup analog of the matched-pairs t test. It should be used instead of simple one-way ANOVA when each subject is re-used for measurements in each group. There are many cases where such a procedure is useful.

15 4.3 One-Way ANOVA Confidence Intervals 68 Example 4.4: Suppose we want to compare the effectiveness of three kinds of fertilizer for growing corn. We have five plots of land available to use, so we divide each plot into thirds and use one fertilizer on each third. Here the plots of land are the subjects and the fertilizers are the groups. Each subject is being reused for each group, so we can t use the one-way ANOVA procedure we discussed in this section. However, this type of data can be analyzed using a block design. n Unfortunately, we won t have time to discuss block designs in detail in this course. The textbook doesn t discuss them either, so if for some reason you need to learn about them, consult another textbook instead. (I can give you a reference if you re interested.) 4.3 One-Way ANOVA Confidence Intervals The one-way ANOVA F test allows us to conclude whether or not the population group means are all equal. However, we might also want to say something about what we think the group means actually are, or about which group means are different and by how much. We can answer these questions by constructing confidence intervals. Since there are multiple quantities for which we might want to construct confidence intervals in a one-way ANOVA setup, we need to discuss the right way to do this. Z Simultaneous Confidence Intervals When we construct more than one confidence interval at a time, we have to be careful to maintain our specified overall confidence level. For example, if we re 95% confident in the statement µ 1 is between 78 and 86, and we re also 95% confident in the statement µ 2 is between 31 and 39, then we ll (usually) be less than 95% confident in the combined statement µ 1 is between 78 and 86 and µ 2 is between 31 and 39. When we want to state a certain overall confidence level for several confidence intervals simultaneously, we need to construct simultaneous confidence intervals. (If we re only interested in setting the confidence level for one confidence interval at a time, then we might call this an individual confidence level, to distinguish it from an overall simultaneous confidence level.)

16 4.3 One-Way ANOVA Confidence Intervals 69 Multiple Comparison Methods To construct simultaneous confidence intervals, we have to use something called a multiple comparison method. There are a variety of multiple comparison methods, and the best one to use depends on what kind of confidence intervals we plan to construct. We won t discuss the details here. Z Confidence Intervals for Group Means The most obvious quantities for which we might want to construct confidence intervals are µ 1,..., µ g, the population means of the groups. Since we re constructing multiple confidence intervals at once, we ll need to use a multiple comparison procedure. Many different multiple comparison methods exist for this situation, and one of the most commonly used is the Bonferroni method. We ll refer to the intervals it produces as Bonferroni simultaneous confidence intervals. Assumptions The assumptions for constructing confidence intervals for group means are the same as those for the one-way ANOVA F test. Estimating the Standard Deviation Recall º that one of our assumptions is that each group has the same population standard deviation, which we call σ. We can estimate σ using ˆσ MS E. This quantity will show up in the confidence interval formula, but it might also be useful in its own right. Example 4.5: In Example 4.2, we calculated MS E Hence our estimate for the population standard deviation σ of each group is ˆσ n Formula To construct a set of Bonferroni simultaneous confidence intervals µ 1,..., µ g, we can use the following formula for each µ i : CI for µ i Ȳ iy t ˆσ¾ 1 n i,

17 4.3 One-Way ANOVA Confidence Intervals 70 where t is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get t in this chapter, but we may come back to it later (the Bonferroni method will come up again in a later chapter). Example 4.6: For Example 4.2, simultaneous 95% Bonferroni confidence intervals for the three group means, as calculated by statistical software, are as follows: µ 1 ˆ3.85, 4.87 µ 2 ˆ3.58, 4.60 µ 3 ˆ3.10, 4.12 Since this is a set of simultaneous confidence intervals, we can say that we re 95% confident that all three parameter values are in their corresponding intervals. n Z Confidence Intervals for Differences of Group Means The one-way ANOVA F test only tells us whether there are differences between the groups. It does not give a verdict on which groups are different, or by how much. To figure this out, we can construct confidence intervals to compare each pair of group population means. More specifically, we want to construct simultaneous confidence intervals for µ i µ k for each pair of groups k. For example, with three groups, there would be three quantities for which we would want to construct confidence intervals: µ 1 µ 2, µ 1 µ 3, and µ 2 µ 3. Many different multiple comparison methods exist for this situation, but the best one for our purposes is called the Tukey method. We ll refer to the intervals it produces as Tukey simultaneous confidence intervals. Assumptions The assumptions for constructing Tukey simultaneous confidence intervals are exactly the same as those for the one-way ANOVA F test, with one additional requirement: the group sample sizes n 1, n 2,..., n g should be at least approximately equal.

18 4.3 One-Way ANOVA Confidence Intervals 71 Formula To construct a set of Tukey simultaneous confidence intervals for each pair of groups i and k, we can use the following formula for each k: CI for µ i µ k Ȳ iy ȲkYŽq ˆσ¾ 1 n i 1 n k, where q is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get q, since we would typically use statistical software to calculate it for us. Interpretation For each comparison of two groups, we interpret the corresponding Tukey simultaneous confidence interval as follows: ˆ If the interval contains only positive numbers, then we can conclude that the first of the two population means being compared is bigger than the second. ˆ If the interval contains only negative numbers, then we can conclude that the first of the two population means being compared is smaller than the second. ˆ If the interval contains both positive and negative numbers (in other words, if it contains zero), then we can t conclude that either of the two population means being compared is bigger than the other. Of course, whenever we conclude that one population mean is bigger than another, the interval also gives us an idea of how much bigger. Example 4.7: For Example 4.2, Tukey simultaneous 95% confidence intervals, as calculated by statistical software, are as follows: µ 1 µ 2 ˆ0.44, 0.98 µ 1 µ 3 ˆ0.04, 1.46 µ 2 µ 3 ˆ0.23, 1.19 So we can t conclude that there s any difference between µ 1 and µ 2 or between µ 2 and µ 3, since both of the corresponding intervals contain both positive and negative numbers. However, we can conclude that µ 1 is bigger than µ 3, since the corresponding interval contains only positive numbers.

19 4.3 One-Way ANOVA Confidence Intervals 72 In other words, we can conclude that Location 1 gives out more fries than Location 3, but we can t conclude anything about how Location 2 compares to either of them. n

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### How To Check For Differences In The One Way Anova

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Name: Date: Use the following to answer questions 3-4:

Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### 12: Analysis of Variance. Introduction

1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### Randomized Block Analysis of Variance

Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

ANOVA ANOVA Analysis of Variance Chapter 6 A procedure for comparing more than two groups independent variable: smoking status non-smoking one pack a day > two packs a day dependent variable: number of

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Experimental Designs (revisited)

Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

### The Wilcoxon Rank-Sum Test

1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

### Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### CHAPTER 13. Experimental Design and Analysis of Variance

CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather

### ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Unit 27: Comparing Two Means

Unit 27: Comparing Two Means Prerequisites Students should have experience with one-sample t-procedures before they begin this unit. That material is covered in Unit 26, Small Sample Inference for One

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### The Assumption(s) of Normality

The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

### Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Analysis of Variance. MINITAB User s Guide 2 3-1

3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

### 13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

### How to calculate an ANOVA table

How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis

### Unit 31: One-Way ANOVA

Unit 31: One-Way ANOVA Summary of Video A vase filled with coins takes center stage as the video begins. Students will be taking part in an experiment organized by psychology professor John Kelly in which

### Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

### Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In S. Christian Albright, September 2013 Introduction This document illustrates the use of Excel s Analysis ToolPak add-in for data analysis. The document is aimed at

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Non-Inferiority Tests for One Mean

Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

### 1 Nonparametric Statistics

1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### individualdifferences

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

### Confidence Intervals on Effect Size David C. Howell University of Vermont

Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting

### UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All