Recall this chart that showed how most of our course would be organized:


 Roy Wright
 2 years ago
 Views:
Transcription
1 Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical Quantitative ANOVA Quantitative Quantitative Regression Quantitative Categorical (not discussed) When our data consists of a quantitative response variable and one or more categorical explanatory variables, we can employ a technique called analysis of variance, abbreviated as ANOVA. The material in this chapter corresponds to the first part of Chapter 14 of the textbook. Recall that a categorical explanatory variable is also called a factor. In this chapter, we ll study the simplest form of ANOVA, oneway ANOVA, which uses one factor and one response variable. We ll also study a more complicated setup in the next chapter, twoway ANOVA, which uses two factors instead. (In principle, we could do ANOVA with any number of factors, but in practice, people usually stick to one or two.) 4.1 Basics of OneWay ANOVA Let s start by discussing the way we organize and label the data for oneway ANOVA. We also need to formulate the basic question that we plan to ask.
2 4.1 Basics of OneWay ANOVA 55 Z Setup Typically, when we think about oneway ANOVA, we think about the factor as dividing the subjects into groups. The goal of our analysis is then to compare the means of the subjects in each group. Notation Let g represent the number of groups. Then we ll set things up as follows: ˆ Let µ 1, µ 2,..., µ g represent the true population means of the response variable for the subjects in each group. As usual, these population parameters are what we re really interested in, but we don t know their values. ˆ We call each observation in the sample Y ij, where i is a number from 1 to g that identifies the group number, and j identifies the individual within that group. (For example, Y 12 represents the response variable value of the second individual in the first group.) ˆ We can calculate the sample means for each group, which we ll call Ȳ 1Y, Ȳ2Y,..., ȲgY. We can use these known sample means as estimates of the corresponding unknown population means. Example 4.1: Suppose we want to see if three McDonald s locations around town tend to put the same amount of fries in a medium order, or if some locations put more fries in the container than others. We take the next 30 days on the calendar and randomly assign 10 days to each of the three locations. On each day, we go to the specified location, order a medium order of fries, take it home, and weigh it to see how many ounces of fries it contains. The categorical explanatory variable is just which location we went to, and the quantitative response variable is the number of ounces of fries. For each of the three locations (g 3), the population consists of all medium orders of fries sold at that location, while the sample consists of the orders that we actually got. The population means, which we call µ 1, µ 2, µ 3, represent the average number of ounces of fries in all orders at each location, and these are the quantities we re interested in. We estimate them using Ȳ1Y, Ȳ2Y, Ȳ3Y, the sample means for each location, which are collected from the data for our orders. The data is shown in Figure 4.1. n
3 4.1 Basics of OneWay ANOVA 56 Location Fries (ounces) Mean Std. Dev Figure 4.1: Ounces of fries in 10 medium orders of fries at each of three McDonald s locations Question of Interest What we really want to know is whether all of the groups have the same population mean, that is, whether µ 1, µ 2,..., µ g are all the same. This is equivalent to asking whether or not the response variable depends on the factor. Intuitively speaking, the most obvious way to answer this question is by looking at Ȳ1Y, Ȳ2Y,..., ȲgY, the sample means of the various groups. If they are close enough to each other, in some sense, then we re willing to believe that all the true population means µ 1, µ 2,..., µ g are the same. If one or more of Ȳ1Y, Ȳ2Y,..., ȲgY are too far from the others, then that convinces us that the true population means must not all be the same. All that remains is to figure out what we mean by close enough and too far. We ll eventually see how to do this with a hypothesis test. Z OneWay ANOVA Table ANOVA gets its name (analysis of variance) from the fact that it examines different kinds of variability in the data. It then uses this information to construct a hypothesis test. To describe these different kinds of variability, we ll first need to introduce some more notation:
4 4.1 Basics of OneWay ANOVA 57 ˆ ȲYY represents the overall sample mean of all the data from all groups combined. ˆ N is the total number of observations, and n i is the number of observations in the ith group. (So n 1 n 2 n g N.) Sums of Squares The most basic quantities that ANOVA uses to describe different kinds of variability are the sums of squares, abbreviated SS. Oneway ANOVA involves three sums of squares: ˆ The total sum of squares, SS Tot, measures the overall variability in the data by looking at how the Y ij values vary around ȲYY, their overall mean. Its formula is SS Tot g n i Q i 1 Q» 1 It can be seen from the formula that SS Tot~ˆN is what we would get if we lumped all N observations together, ignoring groups, and calculated the sample standard deviation. j 1 Y ij ȲYYŽ2. ˆ The group sum of squares, SS G, measures the variability between the groups by looking at how the sample means for each group, ȲiY, vary around ȲYY, the overall mean. Its formula is SS G g Q i 1 n i Ȳ iy ȲYYŽ2. ˆ The error sum of squares, SS E, measures the variability within the groups by looking at how each Y ij value varies around ȲiY, the sample mean for its group. Its formula is SS E g n i Q i 1 Q j 1 Y ij ȲiYŽ2. If we call the sample standard deviation within each group s i, then another formula for SS E is SS E g Q i 1ˆn i 1 s 2 i.
5 4.1 Basics of OneWay ANOVA 58 It turns out to be true that SS Tot SS G SS E. In words, the total variability equals the sum of the variability between groups and the variability within groups. Degrees of Freedom The sums of squares are supposed to measure different kinds of variability in the data, but they also tend to be influenced in various ways by the number of groups g and the number of observations N. This influence is measured by quantities called degrees of freedom that are associated with each sum of squares. Their formulas are df Tot N 1, df G g 1, df E N g. Notice that df Tot df G df E. The group and error degrees of freedom add to the total, just like the sums of squares do. Mean Squares The mean squares are just the sums of squares divided by their degrees of freedom: MS G SS G df G, MS E SS E df E. (We seldom bother calculating MS Tot, because it s just the square of the sample standard deviation of all N observations lumped together.) MS G and MS E measure the variability between groups and within groups in a way that properly accounts for g and N, unlike SS G and SS E. Table We typically summarize all this information in an ANOVA table. An ANOVA table for oneway ANOVA is laid out as shown in Figure 4.2. (A few other quantities that we ll calculate later are also sometimes included as extra columns on the right side of the ANOVA table.) Example 4.2: The ANOVA table for the data shown in Figure 4.1 would obviously be very tedious to calculate by hand, so we use computer software to calculate the ANOVA table shown in Figure 4.3. n
6 4.2 OneWay ANOVA F Test 59 Source df SS MS Group df G SS G MS G Error df E SS E MS E Total df Tot SS Tot Figure 4.2: Generic oneway ANOVA table. Source df SS MS Group Error Total Figure 4.3: ANOVA table for the data in Figure OneWay ANOVA F Test The focus of ANOVA is a hypothesis test for checking whether all the groups have the same population mean. This is the same as testing whether the response variable depends on the factor. Sometimes we ll refer to this as a test for whether the factor has an effect on the response variable (although it may not be right to think about this as a literal causeandeffect relationship). Z OneWay ANOVA F Test Procedure Like any other hypothesis test, the oneway ANOVA F test consists of the standard five steps. Assumptions The oneway ANOVA F test makes four assumptions: ˆ The data comes from a random sample or randomized experiment. In an observational study, the subjects in each group should be a random sample from that group. In an experiment, the subjects should be randomly assigned to the groups.
7 4.2 OneWay ANOVA F Test 60 ˆ The data for each group should be independent. For example, we wouldn t want to reuse the same subject for measurements in more than one group. ˆ For each group, the population distribution of the response variable has a normal distribution. To check this assumption, there a couple of things we should look for: The shape of the data should look at least sort of close to normal. There should be no outliers. ˆ The population distribution of the response variable has the same standard deviation σ for each group. Of course, we don t know σ, but we can still check this assumption by comparing the sample standard deviations for each group. As an approximate rule of thumb, we typically don t worry unless one group s standard deviation is more than twice as big as another s. Note: The textbook organizes these four assumptions a little differently. It combines my first two and my last two, and so it lists only two assumptions. Hypotheses The null hypothesis for the oneway ANOVA F test is that the factor has no effect, and the alternative is that it does. In terms of parameters, we can write these hypotheses as follows: H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. Test Statistic If we re testing whether or not µ 1, µ 2,..., µ g are all equal, then it seems reasonable to look at our estimates of those quantities and see if those are all close enough to each other. So we want to look at whether Ȳ 1Y, Ȳ2Y,..., ȲgY are all close enough to each other. We measure the closeness of the group means using MS G, the variability between groups. But there s something else we need to consider as
8 4.2 OneWay ANOVA F Test 61 State GA AL FL $2.06 $2.15 $2.25 Data $2.05 $2.16 $2.24 $2.04 $2.15 $2.26 $2.05 $2.14 $2.25 Mean $2.05 $2.15 $2.25 State GA AL FL $2.37 $2.42 $2.07 Data $1.73 $2.02 $1.83 $1.97 $2.18 $2.47 $2.13 $1.78 $2.23 Mean $2.05 $2.15 $2.25 Figure 4.4: Two hypothetical data sets for a study of gas prices. well. Look at the data in Figure 4.4, which shows some hypothetical data comparing gas prices from three different states. Notice that the sample mean for each group (state) is the same for both data sets, so the variability between groups, MS G, is the same as well. However, common sense says that the data set on the left is much more convincing that there is an actual difference from group to group. Mathematically, this is because the data set on the left has less variability within groups, which we measure with MS E. Our test statistic compares the variability between groups to the variability within groups by taking a ratio: F MS G MS E. When MS G is large compared to MS E, like the hypothetical data set on the left, F will be large. So larger F values represent more evidence that there is a difference between the group population means in other words, more evidence against H 0 and in favor of H a. PValue and the F Distribution Recall the definition of the pvalue: The pvalue is the probability of getting a test statistic value at least as extreme as the one observed, if H 0 is true. Typically the pvalue is a tail probability from whatever kind of statistical distribution the test statistic has when H 0 is true. For the oneway ANOVA F test statistic, we call this distribution an F distribution, like the ones shown in Figure 4.5.
9 4.2 OneWay ANOVA F Test Value of F Value of F Figure 4.5: Density of the F distribution for df 1 2, df 2 27 (left) and df 1 3, df 2 40 (right). An F distribution has the following properties: ˆ It is skewed right. ˆ Things with an F distribution can t be negative, so the F distribution has only one tail. (We never need to double any tail probabilities from an F distribution.) ˆ The center of the F distribution is usually somewhere around 1, or a little less. ˆ The exact shape of the F distribution is determined by two different degrees of freedom the numerator degrees of freedom, or df 1, and the denominator degrees of freedom, or df 2. If H 0 is true, our test statistic, F, has an F distribution with df 1 df G and df 2 df E. This is easy to remember, since the formula for F is F MS G MS E, and the numerator and denominator degrees of freedom are just the degrees of freedom associated with the quantities in the numerator and denominator of F. Remember that we said the larger values of F are the values that are more supportive of H a. So the pvalue is the probability of getting an
10 4.2 OneWay ANOVA F Test 63 F value larger than the one we actually got, if H 0 is true. Since the test statistic F has an F distribution if H 0 is true, this probability is represented by the shaded area in Figure 4.6. To calculate this probability exactly, we typically need statistical software Value of F Figure 4.6: Tail probability of an F distribution with df 1 3, df If we don t have access to statistical software, we often have to use an F table like the one in the back of our textbook to try to figure out the pvalue. Ideally, we would go to our F table, find the correct df 1 and df 2, look up our F value, and it would tell us the pvalue. Unfortunately, that s way too much information and would require our F table to be dozens of pages long. Instead, a typical F table, like the one in Figure 4.7, works a little differently. For each combination of df 1 and df 2, the table tells us only a single number. That number is the F value corresponding to a pvalue of We then check whether our observed F test statistic value is larger or smaller than the one listed in the table. ˆ If our test statistic value is larger than the number in the table, then our pvalue is smaller than ˆ If our test statistic value is smaller than the number in the table, then our pvalue is larger than We can see that the pvalue behaves as it should: Smaller pvalues correspond to larger F values, and both correspond to more evidence against
11 4.2 OneWay ANOVA F Test 64 df 1 df Figure 4.7: Topleft corner of an F table for righttail probabilities of H 0 and in support of H a. Decision We make a decision the same way we always do for any hypothesis test: by rejecting H 0 if the pvalue is less than or equal to α (often 0.05), and failing to reject H 0 if the pvalue is greater than α. Remember that the hypotheses we re testing are H 0 : µ 1, µ 2,..., µ g are all equal. H a : µ 1, µ 2,..., µ g are not all equal. So let s think about what our decision really represents. ˆ If we reject H 0, then we re concluding that at least some of the group population means are different. ˆ If we fail to reject H 0, then we re concluding that it s reasonable that all the group population means are the same. Example 4.3: Let s go through the five steps of the oneway ANOVA F test for the data in Example 4.2 using α Let s check each of the four assumptions. ˆ Each day was randomly assigned to a particular location, so this is a randomized experiment. ˆ The different groups correspond to different locations, each of which should have no ability to affect the measurements of the other two, so the groups are independent.
12 4.2 OneWay ANOVA F Test 65 ˆ It s very hard to tell much about the shape of the data with only 10 observations in each group, but quick dotplots for each group show shapes that are at least somewhat consistent with a normal distribution. Also, we see no outliers in any of the groups. ˆ The sample standard deviations for the three groups are at least sort of close to each other, so we don t see any violation of the constant standard deviation assumption. Our assumptions are okay, so we can proceed. 2. The null hypothesis is that µ 1 µ 2 µ 3, which means that the three locations, on average, give out the same amount of fries. The alternative hypothesis is that at least one of µ 1, µ 2, µ 3 is not equal to the others, which means that at least one of the locations gives out more or fewer fries than the others. 3. The test statistic F is calculated from the mean squares in the ANOVA table shown in Figure 4.3: F MS G MS E To calculate our pvalue, we compare our observed test statistic value to an F distribution with df 1 df G 2 and df 2 df E 27. When we consult the F table for these df values, the number that it gives us is This means that for an F distribution with these degrees of freedom, a test statistic value of 3.35 would correspond to a pvalue of Our observed test statistic value of 3.55 is larger than 3.35, so our pvalue is smaller than (We could use statistical software to calculate the exact pvalue, which turns out to be ) 5. Our pvalue is smaller than our α, so we reject H 0. We can conclude that the three locations do not give out the same amount of fries. However, we can t conclude anything about which locations give out more or less fries than the others, or about how many more or less they give out. n Figure 4.8 may be helpful for remembering various results and interpretations of a oneway ANOVA F test.
13 4.2 OneWay ANOVA F Test 66 Z Large F value Small F value (much larger than 1) (around 1 or less than 1) Small pvalue Large pvalue Evidence against H 0 (for H a ) No evidence against H 0 (for H a ) Reject H 0 Fail to reject H 0 Conclude that some population Reasonable that all population group means differ group means are the same Figure 4.8: Results and interpretations of a oneway ANOVA F test Alternatives to the OneWay ANOVA F Test There are some situations in which oneway ANOVA could be used, but another test procedure might be equivalent or preferable. OneWay ANOVA with Two Groups When we have only two groups, then the oneway ANOVA F test serves exactly the same purpose as the twosided twosample t test from Section 10.2, which you saw in your previous course. It turns out that oneway ANOVA with only two groups is completely equivalent to the twosided twosample t test, in the sense that both tests will give exactly the same pvalue. (This happens because their test statistics are related: F t 2.) So in this case, it makes no difference which procedure is used, since both will yield exactly the same conclusion. However, the twosample t test is slightly more flexible in this case since it also allows us to use a onesided alternative hypothesis if we so desire. Ordinal Variables If the factor is an ordinal variable, oneway ANOVA makes no use of the ordering information. There exist other test procedures that might make slightly fewer type II errors than oneway ANOVA by taking into account
14 4.2 OneWay ANOVA F Test 67 the order of the factor categories, but we won t discuss these procedures here. Normality Oneway ANOVA assumes that the data in each group comes from a normal distribution. Even if the distribution is somewhat different from normal, oneway ANOVA can still work okay if the sample sizes are large enough. However, when sample sizes are small, oneway ANOVA can be unreliable if the data in one or more of the groups comes from a highly nonnormal distribution. There exists a nonparametric equivalent of the oneway ANOVA F test called the KruskalWallis test that uses only the ranks of the data and is okay to use no matter what distribution the data comes from. We won t discuss the details, but Section 15.2 of the textbook gives a brief outline. Block Designs Recall from Stats 1 that when we wanted to compare the means of two groups, there were two different procedures: ˆ The twosample t test compared groups when the data in one group was independent from the data in the other group. ˆ The matchedpairs t test compared groups when each observation in one group was paired with a corresponding observation in the other group (such as husbands and wives, or before and after measurements). The oneway ANOVA F test we discussed in this section is the multiplegroup analog of the twosample t test. (That s why they re equivalent when there are only two groups.) As mentioned in the assumptions, it can t be used when the observations in a group correspond to observations in other groups. There also exists a procedure called a block design that is the multiplegroup analog of the matchedpairs t test. It should be used instead of simple oneway ANOVA when each subject is reused for measurements in each group. There are many cases where such a procedure is useful.
15 4.3 OneWay ANOVA Confidence Intervals 68 Example 4.4: Suppose we want to compare the effectiveness of three kinds of fertilizer for growing corn. We have five plots of land available to use, so we divide each plot into thirds and use one fertilizer on each third. Here the plots of land are the subjects and the fertilizers are the groups. Each subject is being reused for each group, so we can t use the oneway ANOVA procedure we discussed in this section. However, this type of data can be analyzed using a block design. n Unfortunately, we won t have time to discuss block designs in detail in this course. The textbook doesn t discuss them either, so if for some reason you need to learn about them, consult another textbook instead. (I can give you a reference if you re interested.) 4.3 OneWay ANOVA Confidence Intervals The oneway ANOVA F test allows us to conclude whether or not the population group means are all equal. However, we might also want to say something about what we think the group means actually are, or about which group means are different and by how much. We can answer these questions by constructing confidence intervals. Since there are multiple quantities for which we might want to construct confidence intervals in a oneway ANOVA setup, we need to discuss the right way to do this. Z Simultaneous Confidence Intervals When we construct more than one confidence interval at a time, we have to be careful to maintain our specified overall confidence level. For example, if we re 95% confident in the statement µ 1 is between 78 and 86, and we re also 95% confident in the statement µ 2 is between 31 and 39, then we ll (usually) be less than 95% confident in the combined statement µ 1 is between 78 and 86 and µ 2 is between 31 and 39. When we want to state a certain overall confidence level for several confidence intervals simultaneously, we need to construct simultaneous confidence intervals. (If we re only interested in setting the confidence level for one confidence interval at a time, then we might call this an individual confidence level, to distinguish it from an overall simultaneous confidence level.)
16 4.3 OneWay ANOVA Confidence Intervals 69 Multiple Comparison Methods To construct simultaneous confidence intervals, we have to use something called a multiple comparison method. There are a variety of multiple comparison methods, and the best one to use depends on what kind of confidence intervals we plan to construct. We won t discuss the details here. Z Confidence Intervals for Group Means The most obvious quantities for which we might want to construct confidence intervals are µ 1,..., µ g, the population means of the groups. Since we re constructing multiple confidence intervals at once, we ll need to use a multiple comparison procedure. Many different multiple comparison methods exist for this situation, and one of the most commonly used is the Bonferroni method. We ll refer to the intervals it produces as Bonferroni simultaneous confidence intervals. Assumptions The assumptions for constructing confidence intervals for group means are the same as those for the oneway ANOVA F test. Estimating the Standard Deviation Recall º that one of our assumptions is that each group has the same population standard deviation, which we call σ. We can estimate σ using ˆσ MS E. This quantity will show up in the confidence interval formula, but it might also be useful in its own right. Example 4.5: In Example 4.2, we calculated MS E Hence our estimate for the population standard deviation σ of each group is ˆσ n Formula To construct a set of Bonferroni simultaneous confidence intervals µ 1,..., µ g, we can use the following formula for each µ i : CI for µ i Ȳ iy t ˆσ¾ 1 n i,
17 4.3 OneWay ANOVA Confidence Intervals 70 where t is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get t in this chapter, but we may come back to it later (the Bonferroni method will come up again in a later chapter). Example 4.6: For Example 4.2, simultaneous 95% Bonferroni confidence intervals for the three group means, as calculated by statistical software, are as follows: µ 1 ˆ3.85, 4.87 µ 2 ˆ3.58, 4.60 µ 3 ˆ3.10, 4.12 Since this is a set of simultaneous confidence intervals, we can say that we re 95% confident that all three parameter values are in their corresponding intervals. n Z Confidence Intervals for Differences of Group Means The oneway ANOVA F test only tells us whether there are differences between the groups. It does not give a verdict on which groups are different, or by how much. To figure this out, we can construct confidence intervals to compare each pair of group population means. More specifically, we want to construct simultaneous confidence intervals for µ i µ k for each pair of groups k. For example, with three groups, there would be three quantities for which we would want to construct confidence intervals: µ 1 µ 2, µ 1 µ 3, and µ 2 µ 3. Many different multiple comparison methods exist for this situation, but the best one for our purposes is called the Tukey method. We ll refer to the intervals it produces as Tukey simultaneous confidence intervals. Assumptions The assumptions for constructing Tukey simultaneous confidence intervals are exactly the same as those for the oneway ANOVA F test, with one additional requirement: the group sample sizes n 1, n 2,..., n g should be at least approximately equal.
18 4.3 OneWay ANOVA Confidence Intervals 71 Formula To construct a set of Tukey simultaneous confidence intervals for each pair of groups i and k, we can use the following formula for each k: CI for µ i µ k Ȳ iy ȲkYŽq ˆσ¾ 1 n i 1 n k, where q is a number that depends on the confidence level, N, and g. We won t discuss the details of how to get q, since we would typically use statistical software to calculate it for us. Interpretation For each comparison of two groups, we interpret the corresponding Tukey simultaneous confidence interval as follows: ˆ If the interval contains only positive numbers, then we can conclude that the first of the two population means being compared is bigger than the second. ˆ If the interval contains only negative numbers, then we can conclude that the first of the two population means being compared is smaller than the second. ˆ If the interval contains both positive and negative numbers (in other words, if it contains zero), then we can t conclude that either of the two population means being compared is bigger than the other. Of course, whenever we conclude that one population mean is bigger than another, the interval also gives us an idea of how much bigger. Example 4.7: For Example 4.2, Tukey simultaneous 95% confidence intervals, as calculated by statistical software, are as follows: µ 1 µ 2 ˆ0.44, 0.98 µ 1 µ 3 ˆ0.04, 1.46 µ 2 µ 3 ˆ0.23, 1.19 So we can t conclude that there s any difference between µ 1 and µ 2 or between µ 2 and µ 3, since both of the corresponding intervals contain both positive and negative numbers. However, we can conclude that µ 1 is bigger than µ 3, since the corresponding interval contains only positive numbers.
19 4.3 OneWay ANOVA Confidence Intervals 72 In other words, we can conclude that Location 1 gives out more fries than Location 3, but we can t conclude anything about how Location 2 compares to either of them. n
Chapter 7. Oneway ANOVA
Chapter 7 Oneway ANOVA Oneway ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The ttest of Chapter 6 looks
More informationSociology 6Z03 Topic 15: Statistical Inference for Means
Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical
More informationAn example ANOVA situation. 1Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)
An example ANOVA situation Example (Treating Blisters) 1Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment
More informationUnit 21 Student s t Distribution in Hypotheses Testing
Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 TwoWay ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationSupplement on the KruskalWallis test. So what do you do if you don t meet the assumptions of an ANOVA?
Supplement on the KruskalWallis test So what do you do if you don t meet the assumptions of an ANOVA? {There are other ways of dealing with things like unequal variances and nonnormal data, but we won
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationName: Date: Use the following to answer questions 34:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationTwosample ttests.  Independent samples  Pooled standard devation  The equal variance assumption
Twosample ttests.  Independent samples  Pooled standard devation  The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationChapter 7 Part 2. Hypothesis testing Power
Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationUnit 24 Hypothesis Tests about Means
Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a twosample t test To perform a paired t test To perform a twosample t test A measure of the amount
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationOneWay Analysis of Variance (ANOVA) Example Problem
OneWay Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesistesting technique used to test the equality of two or more population (or treatment) means
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationRandomized Block Analysis of Variance
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the ttest
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One Way ANOVA Limitations of the ttest Although the ttest is commonly used, it has limitations Can only
More informationNonparametric TwoSample Tests. Nonparametric Tests. Sign Test
Nonparametric TwoSample Tests Sign test MannWhitney Utest (a.k.a. Wilcoxon twosample test) KolmogorovSmirnov Test Wilcoxon SignedRank Test TukeyDuckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationMath 62 Statistics Sample Exam Questions
Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected
More information12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationStatistiek I. ttests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. John Nerbonne 1/35
Statistiek I ttests John Nerbonne CLCG, Rijksuniversiteit Groningen http://wwwletrugnl/nerbonne/teach/statistieki/ John Nerbonne 1/35 ttests To test an average or pair of averages when σ is known, we
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationChapter 6: t test for dependent samples
Chapter 6: t test for dependent samples ****This chapter corresponds to chapter 11 of your book ( t(ea) for Two (Again) ). What it is: The t test for dependent samples is used to determine whether the
More informationData Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments  Introduction
Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments  Introduction
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationTwoSample TTests Assuming Equal Variance (Enter Means)
Chapter 4 TwoSample TTests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when the variances of
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationBusiness Statistics. Lecture 8: More Hypothesis Testing
Business Statistics Lecture 8: More Hypothesis Testing 1 Goals for this Lecture Review of ttests Additional hypothesis tests Twosample tests Paired tests 2 The Basic Idea of Hypothesis Testing Start
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationTHE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKALWALLIS TEST: The nonparametric alternative to ANOVA: testing for difference between several independent groups 2 NON
More informationStatistical inference provides methods for drawing conclusions about a population from sample data.
Chapter 15 Tests of Significance: The Basics Statistical inference provides methods for drawing conclusions about a population from sample data. Two of the most common types of statistical inference: 1)
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationLecture 7: Binomial Test, Chisquare
Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two
More informationUnit 29 ChiSquare GoodnessofFit Test
Unit 29 ChiSquare GoodnessofFit Test Objectives: To perform the chisquare hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
More informationMinitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab StepByStep
Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different
More informationFor example, enter the following data in three COLUMNS in a new View window.
Statistics with Statview  18 Paired ttest A paired ttest compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the
More informationExample for testing one population mean:
Today: Sections 13.1 to 13.3 ANNOUNCEMENTS: We will finish hypothesis testing for the 5 situations today. See pages 586587 (end of Chapter 13) for a summary table. Quiz for week 8 starts Wed, ends Monday
More informationChi Square for Contingency Tables
2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure
More informationHypothesis testing  Steps
Hypothesis testing  Steps Steps to do a twotailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationHypothesis tests: the ttests
Hypothesis tests: the ttests Introduction Invariably investigators wish to ask whether their data answer certain questions that are germane to the purpose of the investigation. It is often the case that
More informationThe scatterplot indicates a positive linear relationship between waist size and body fat percentage:
STAT E150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the
More informationc. The factor is the type of TV program that was watched. The treatment is the embedded commercials in the TV programs.
STAT E150  Statistical Methods Assignment 9 Solutions Exercises 12.8, 12.13, 12.75 For each test: Include appropriate graphs to see that the conditions are met. Use Tukey's Honestly Significant Difference
More informationANOVA ANOVA. TwoWay ANOVA. OneWay ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups
ANOVA ANOVA Analysis of Variance Chapter 6 A procedure for comparing more than two groups independent variable: smoking status nonsmoking one pack a day > two packs a day dependent variable: number of
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationTwoSample TTests Allowing Unequal Variance (Enter Difference)
Chapter 45 TwoSample TTests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when no assumption
More informationExperimental Designs (revisited)
Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More informationLecture 23 Multiple Comparisons & Contrasts
Lecture 23 Multiple Comparisons & Contrasts STAT 512 Spring 2011 Background Reading KNNL: 17.317.7 231 Topic Overview Linear Combinations and Contrasts Pairwise Comparisons and Multiple Testing Adjustments
More informationUNDERSTANDING THE TWOWAY ANOVA
UNDERSTANDING THE e have seen how the oneway ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationSTAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013
STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationOneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
OneWay Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationNonparametric tests, Bootstrapping
Nonparametric tests, Bootstrapping http://www.isrec.isbsib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis
More information1 Confidence intervals
Math 143 Inference for Means 1 Statistical inference is inferring information about the distribution of a population from information about a sample. We re generally talking about one of two things: 1.
More informationChapter 9. TwoSample Tests. Effect Sizes and Power Paired t Test Calculation
Chapter 9 TwoSample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and TwoSample Tests: Paired Versus
More informationT O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
More informationHypothesis Testing hypothesis testing approach formulation of the test statistic
Hypothesis Testing For the next few lectures, we re going to look at various test statistics that are formulated to allow us to test hypotheses in a variety of contexts: In all cases, the hypothesis testing
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationCHAPTERS 46: Hypothesis Tests Read sections 4.3, 4.5, 5.1.5, Confidence Interval vs. Hypothesis Test (4.3):
CHAPTERS 46: Hypothesis Tests Read sections 4.3, 4.5, 5.1.5, 6.1.3 Confidence Interval vs. Hypothesis Test (4.3): The purpose of a confidence interval is to estimate the value of a parameter. The purpose
More informationIntroduction to Statistics for Computer Science Projects
Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationCHAPTER 15: Tests of Significance: The Basics
CHAPTER 15: Tests of Significance: The Basics The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 15 Concepts 2 The Reasoning of Tests of Significance
More informationMULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
More informationChisquare test Testing for independeny The r x c contingency tables square test
Chisquare test Testing for independeny The r x c contingency tables square test 1 The chisquare distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computeraided
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationCHAPTER 13. Experimental Design and Analysis of Variance
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
More information13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated KruskalWallis Test Post hoc Comparisons In the prior
More informationChi Square Tests. Chapter 10. 10.1 Introduction
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 3 / 4 May 2005 Department of Experimental Psychology University of Cambridge Slides at pobox.com/~rudolf/psychology
More informationModule 9: Nonparametric Tests. The Applied Research Center
Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } OneSample ChiSquare Test
More informationToday: Dummy variables. Dummy variables in a multiple regression, regression wrap up.
Today: Dummy variables. Dummy variables in a multiple regression, regression wrap up. Looking back in regression, we ve looked at how an interval data response y changes as an interval data explanatory
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationreductio ad absurdum null hypothesis, alternate hypothesis
Chapter 10 s Using a Single Sample 10.1: Hypotheses & Test Procedures Basics: In statistics, a hypothesis is a statement about a population characteristic. s are based on an reductio ad absurdum form of
More information