Comparing the Means of Two Populations: Independent Samples
|
|
|
- Emerald Gladys Terry
- 9 years ago
- Views:
Transcription
1 CHAPTER 14 Comparing the Means of Two Populations: Independent Samples 14.1 From One Mu to Two Do children in phonics-based reading programs become better readers than children in whole language programs? Do male and female high school students differ in mathematics ability? Do students who received training in test-taking strategies obtain higher scores on a statewide assessment than students who did not receive such training? These questions lead to an important way of increasing knowledge: studying the difference between two groups of observations. In each case you obtain two samples, and your concern is with comparing the two populations from which the samples were selected. This is in contrast to making inferences about a single population from a single sample, as has been our focus so far. Nonetheless, you soon will be comforted by discovering that even though we have moved from one to two, the general logic of hypothesis testing has not changed. In the immortal words of Yogi Berra, it s like déjà vu all over again. Before we proceed, we should clarify what is meant by the phrase independent samples. Two samples are said to be independent when none of the observations in one group is in any way related to observations in the other group. This will be true, for example, when the samples are selected at random from their populations or when a pool of volunteers is divided at random into two treatment groups. In contrast, the research design in which an investigator uses the same individuals in both groups, as in a before-after comparison, provides a common example of dependent samples. (We will deal with dependent samples in Chapter 15.) Let s look at an experiment designed to study the effect of scent on memory, which Gregory is conducting as part of his undergraduate honors thesis. He selects 18 volunteers and randomly divides them into two groups. Participants in Group 1 read a 1500-word passage describing a person s experience of hiking the Appalachian Trail alone. The paper on which the passage appears has been treated with a pleasant, unfamiliar fragrance so that there is a noticeable scent as Group 1 participants read about the hiker s adventure. One week later, Gregory tests their recall by having them write down all that they can remember from the passage. 283
2 284 Chapter 14 Comparing the Means of Two Populations: Independent Samples They do so on a sheet of paper noticeably scented with the same fragrance. Group 2 participants are subjected to exactly the same conditions, except that there is no noticeable fragrance at any time during the experiment. Finally, Gregory determines for each participant the number of facts that have been correctly recalled from the passage (e.g., the weather was uncharacteristically cooperative, there was a close encounter with a mother bear and her cubs) and then computes the mean for each group: Group 1 (scent present): X 1 23 Group 2 (scent absent): X 2 18 On the average, participants in Group 1 recalled five more facts from the passage than did participants in Group 2. Does this sample difference necessarily mean that there is a true difference between the two conditions that is, a difference between the means, 1 and 2, of the two theoretical populations of observations? (These two populations would comprise all individuals, similar in characteristics to those studied here, who potentially could participate in the two conditions of this experiment.) If so, it would support the substantive conclusion that memory is facilitated by scent. But you cannot be sure simply by inspecting X 1 and X 2, for you know that both sample means are affected by chance sampling variation. You would expect a difference between these sample means on the basis of chance alone even if scent had no effect on memory at all. As always in statistical inference, the important question is not about samples, but rather about the populations that the samples represent. To determine whether the difference between two sample means, X 1 X 2, is large enough to indicate a difference in the population, 1 2, you use the same general logic as for testing hypotheses about means of single populations. The application of this logic to the problem of comparing the means of two populations is the main concern of this chapter, and we will use Gregory s experiment as illustration Statistical Hypotheses Gregory s interest in the influence of scent on memory leads to the research question, Does the presence of a noticeable scent, both while reading a passage and later while recalling what had been read, affect the amount of information recalled? If it does, the mean of the population of scores obtained under the Group 1 condition (scent present) should differ from that obtained under the Group 2 condition (scent absent). This becomes the alternative hypothesis, H 1 : Although Gregory wants to know if there is a difference, he will formally test the null hypothesis that there is no difference ( 1 2 0). As you saw in Chapter 11, he does this because the null hypothesis has the specificity that makes a statistical test possible. Thus Gregory s statistical hy-
3 14.3 The Sampling Distribution of Differences Between Means 285 potheses are: H 0 : (scent has no effect on recall) H 1 : (scent has an effect on recall) In comparisons of two populations, the specific hypothesis to be tested typically is that of no difference, or H 0 : The nondirectional alternative, H 1 : 1 2 0, is appropriate in Gregory s case, for he is interested in knowing whether the difference in treatment made any difference in the response variable (scores on the recall test). 1 That is, an effect of scent in either direction is of interest to him. If he were interested in only one direction, the alternative hypothesis would take one of two forms: H 1 : (interested in only a positive effect of scent) or H 1 : (interested in only a negative effect of scent) From here on, the test of H 0 : follows the same logic and general procedure described in Chapters 11 and 13 for testing hypotheses about single means. Gregory adopts a level of significance, decides on sample size, and selects the sample. He then compares his obtained sample difference, X 1 X 2, with the sample differences that would be expected if there were no difference between the population means that is, if H 0 : were true. This comparison is accomplished with a t test, modified to accommodate a difference between two sample means. If the sample difference is so great that it falls among the very rare outcomes (under the null hypothesis), then Gregory rejects H 0 in favor of H 1 ; if not, H 0 is retained The Sampling Distribution of Differences Between Means The general notion of the sampling distribution of differences between means is no different from the familiar sampling distribution of means, which provides the basis for the one-sample tests described in Chapters 11 and 13. Suppose that the presence of scent has absolutely no effect on recall ( 1 2 0) and that, just for perverse fun, you repeat Gregory s experiment many, many times. For the pair of samples described earlier, Gregory obtained X 1 23 and X 2 18, giving a difference between means of 5. The experiment is repeated in an identical manner but with a new random selection of participants. Again the two means are calculated, and the difference between them is determined. Let s say this time the mean score for the scent-present group is lower than that for the scent-absent group: X 1 X 2 2. A third pair of samples yields the sample difference, X 1 X (barely any difference at all). If this procedure were repeated for an unlimited number of sampling 1 The response variable also is called the dependent variable.
4 286 Chapter 14 Comparing the Means of Two Populations: Independent Samples Population X 1 : Scent present 1 = 2 Population X 2 : Scent absent Sample pair 1 Sample pair 2 Sample pair 3 etc. X 1 X 2 = +5 X 1 X 2 = 2 X 1 X 2 = X 1 X 2 : 0 X1 X 2 Figure 14.1 The development of a sampling distribution of differences between means of independent populations. experiments, the sample differences thus generated form the sampling distribution of differences between means. This is illustrated in Figure To summarize: A sampling distribution of differences between means is the relative frequency distribution of X 1 X 2 obtained from an unlimited series of sampling experiments, each consisting of a pair of samples of given size randomly selected from the two populations. Properties of the Sampling Distribution of Differences Between Means When we introduced the sampling distribution of means in Chapter 10, you saw that such a distribution is defined by its mean, standard deviation, and shape (Section 10.7). This is equally true with a sampling distribution of differences between means, as we now will show. The sampling distribution in Figure 14.1 describes the differences between X 1 and X 2 that would be expected, with repeated sampling, if H 0 : were true. Now, if the means of the two populations are the same and pairs of samples are drawn at random, sometimes X 1 will be larger than X 2 (leading to a positive value for X 1 X 2 ) and sometimes X 2 will be larger than X 1 (leading to a negative value for X 1 X 2 ). This, of
5 14.3 The Sampling Distribution of Differences Between Means 287 course, is because of sampling variation. But over the long run, the positive differences will be balanced by the negative differences, and the mean of all the differences will be zero. We will use to signify the mean of this sampling distribution. Thus: X1 X 2 Mean of a sampling distribution of differences between means when H 0 : is true X1 X 2 0 (14.1) This is shown at the bottom of Figure The standard deviation of this sampling distribution is called the standard error of the difference between means. This standard error reflects the amount of variability that would be expected among all possible sample differences. It is given in the following formula: Standard error of the difference between means X 1 X 2 2 X 1 2 X 2 (14.2) Formula (14.2) shows that the standard error of the difference between two sample means depends on the (squared) standard error of each sample mean involved that is, X and. Now, remember that (Formula 10.2). By squaring each side of this expression, you see that 2 X / n 1 X 2. Formula (14.2) therefore can X 2 /n be expressed in terms of the two population variances, and : Standard error of the difference between means (using population variances) X 1 X n n 2 (14.3) X 1 X 2 Formula (14.3) shows that is affected by the amount of variability in each population ( 2 1 and 2 2) and by the size of each sample (n 1 and n 2 ). Because of the location of these terms in the formula, more variable populations lead to larger standard errors, and larger sample sizes lead to smaller standard errors. Finally, the sampling distribution will be normal in shape if the distribution of observations for each population is normal. However, the central limit theorem applies here, just as it did earlier for the sampling distributions of single means. Unless the population shapes are most unusual, sampling distributions of differences between means will tend toward a normal shape if n 1 and n 2 are each at least 20 to 30 cases (there is no sharp dividing line).
6 288 Chapter 14 Comparing the Means of Two Populations: Independent Samples 14.4 Estimating X 1 X 2 As you will recall, the population standard deviation ( ) in one-sample studies is seldom known, and consequently you must compute an estimated standard error of the mean ( s X 1 ) from the sample results. Not surprisingly, the situation is similar when two samples are involved: the population standard deviations, 1 and 2, frequently are unknown, so you must obtain an estimated standard error of the difference between means,. How is this done? An important assumption underlying the test of a difference between two means is that the population variances are equal: This is called the assumption of homogeneity of variance. A logical extension of this assumption is the use of a combined, or pooled, variance estimate to represent both 2 1 and 2 2, rather than making separate estimates from each sample. It is the pooled variance estimate, s 2 pooled, that you use to determine. The first task, then, is to calculate. s 2 pooled Calculating s 2 pooled To understand how to combine the two sample variances ( s 2 1 and s 2 2) as one (s 2 pooled), let s first examine the nature of a variance estimate. Remember: the variance is the square of the standard deviation. You calculate it just like a standard deviation except that the last step taking the square root is omitted. You saw in Formula (13.1) that the sample standard deviation is: s SS n 1 Square each side and you have the variance estimate: s 2 SS n 1 In the present situation, you have two variance estimates ( s 2 and ), and a single variance estimate is required ( s 2 1 s 2 2 pooled). To obtain this single estimate, simply combine the sums of squares from both samples and divide by the total degrees of freedom: Pooled variance estimate of and s 2 pooled SS 1 SS 2 n 1 n 2 2 (14.4) The pooled variance is an average of the two sample variances, where each variance is weighted by its df. This can be seen most easily from the following formula,
7 14.5 The t Test for Two Independent Samples 289 which is equivalent to Formula (14.4) (and particularly convenient if s 2 1 and s 2 2 already are at hand): Notice that each variance is weighted by n 1 degrees of freedom, and the sum of the two weighted variances is then divided by the total degrees of freedom. The total df shows that one degree of freedom is lost for each sample variance. This is more easily seen by the equality, n 1 n 2 2 (n 1 1) (n 2 1). Calculating s X1 X 2 If you replace s 2 pooled for each of the population variances in Formula (14.3), you have a formula for : An equivalent expression of this formula is: Now substitute Formula (14.4) for : s 2 pooled (n 1 1)s2 1 (n 2 1)s 2 2 n 1 n 2 2 s 2 pooled n 1 s 2 pooled s2 pooled n 2 s 2 pooled 1 n1 1 n 2 Estimate of X 1 X 2 SS 1 SS 2 n 1 n n 1 1 n 2 (14.5) 2 It is time to introduce the t test for independent samples, which we will then apply to the data from Gregory s experiment The t Test for Two Independent Samples Recall the structure of the one-sample t test: it is the difference between the sample result ( X) and the condition specified in the null hypothesis ( 0 ), divided by the standard error ( ): s X t X 0 s X 2 If n 1 n 2, Formula 14.6 simplifies to: Here, n is the number of participants in each group. SS 1 SS 2 n(n 1)
8 290 Chapter 14 Comparing the Means of Two Populations: Independent Samples The t test for independent samples has the same general structure. It, too, compares the sample result ( X 1 X 2 ) with the condition specified under the null hypothesis ( 1 2 ), dividing the difference by the standard error ( ). Expressed formally: Because the null hypothesis typically specifies that 1 2 0, the formula above simplifies to: t test for two independent samples t (X 1 X 2 ) ( 1 2 ) t X 1 X 2 (14.6) Now let s carry through on Gregory s problem. Does the presence of a noticeable scent, both while reading a passage and later while recalling what had been read, af- This t ratio will follow Student s t distribution with df n 1 n 2 2, provided several assumptions are met. We have alluded to these assumptions, but it is helpful to reiterate them at this point. The first assumption is that the two samples are independent. That is, none of the observations in one group is in any way related to observations in the other group. (As you will learn in Chapter 15, the dependent-samples t test has a slightly different standard error.) The second assumption is that each of the two populations of observations is normally distributed. Here, of course, the central limit theorem helps out, as it did when we were making inferences about single means (Chapter 13). Consequently, when each sample is larger than 20 to 30 cases, considerable departure from population normality can be tolerated. Finally, it is assumed that the two populations of observations are equally variable ( ). Earlier we referred to this as the assumption of homogeneity of variance, out of which arises the calculation of a pooled variance estimate ( s 2 pooled). Research has shown that violation of this assumption is not problematic unless the population variances are quite different, the two sample sizes also are quite different, and either n 1 or n 2 is small. Therefore, when samples are small, you should look carefully at the data for skewness or large differences in variability. Here, the eyeball is a powerful tool. If you cannot see a problem by such inspection, then it probably won t matter. But if sample size is small and departure from the conditions specified seems to be substantial, you should consider nonparametric or distribution-free techniques that involve few or no assumptions about the population distributions (e.g., see Minium, Clarke, & Coladarci, 1999, ch. 21) Testing Hypotheses About Two Independent Means: An Example
9 14.6 Testing Hypotheses About Two Independent Means: An Example 291 fect the amount of information recalled? Again, we emphasize that the overall logic involved in testing H 0 : is the same as that for all significance tests in this text. You assume H 0 to be true and then determine whether the obtained sample result is sufficiently rare in the direction(s) specified in H 1 to cast doubt on H 0. To do this, you express the sample result as a test statistic (t, in the present case), which you then locate in the theoretical sampling distribution. If the test statistic falls in a region of rejection, H 0 is rejected; if not, H 0 is retained. With this in mind, we now proceed with Gregory s test. Step 1 Step 2 Step 3 Step 4 Formulate the statistical hypotheses and select a level of significance. Gregory s statistical hypotheses are: He must now select his decision criterion, which we will assume is.05. Determine the desired sample size and select the sample. To simplify computational illustrations, we limited Gregory s samples to nine participants each. In practice, one must decide what sample size is needed. Too few participants makes it difficult to discover a difference where one exists, which increases the chances of a Type II error; too many is wasteful and costly (e.g., see Minium, Clarke, & Coladarci, 1999, ch. 17). Calculate the necessary sample statistics. The raw data and all calculations are given in Table Gregory begins by computing the mean and sum of squares for each group (the row at ). The pooled variance estimate, s 2 pooled, is calculated at, which is followed by the calculation of ( ). Finally, Gregory computes the t ratio, obtaining t 2.19 ( ) which has degrees of freedom ( ). Notice that we also presented the sample variance and standard deviation for each group at (in brackets), even though neither is required for subsequent calculations. We did this for three reasons. First, good practice requires reporting s 1 and s 2 along with the outcome of the test, so you ll need these later. Second, knowing the separate variances s 2 1 and s 2 2 al- lows you to easily confirm the reasonableness of the value you obtained for s 2 pooled. Because it is a weighted average of s 2 1 and s 2 2, s 2 pooled must fall between these two values (right in the middle, if n 1 n 2 ). If it does not, then a calculation error has been made. Gregory s s 2 (23.50) happily rests between and 24.75, the values for s 2 pooled 1 and s 2 2, respectively. Third, extreme differences between and might suggest differences between the 2 1 H 0 : H 1 : s s 2 2 population variances, and, thereby casting doubt on the assumption of homogeneous variances. (Gregory s sample variances seem fine in this regard.) Identify the region(s) of rejection. To identify the rejection region(s), you first identify the critical t value(s), t. Remember that there are three things to consider when selecting critical
10 292 Chapter 14 Comparing the Means of Two Populations: Independent Samples Table 14.1 Test of the Difference Between Means of Two Independent Samples Group 1 Group 2 (scent present) (scent absent) n 9 n 9 X 1 ( X X 2 ( X 2 X 2 ) 2 1 X 1 ) /9 SS 1 (X 1 ) 2 162/9 SS 2 (X 2 ) 2 X 1 X X 2 s 2 pooled SS 1 SS n 1 n X 2 s2 1 SS 1 n s 2 2 SS 2 n s s s 2 pooled 1 n1 1 n t X 1 X df n 1 n t Statistical decision: reject H 0 : Substantive conclusion: The presence of scent improves memory. Step 5 values from Table B (Appendix C): H 1,, and df (see Section 13.5). With a two-tailed H 1,.05, and df 16, Gregory has all the information he needs for finding t. He locates df 16 in the first column of Table B and moves over to the column under.05 (in both tails), where he finds the entry Thus, t ( ) the values of t beyond which the most extreme 5% of all possible samples fall (in both tails combined) if H 0 is true. The regions of rejection and the obtained sample t ratio are shown in Figure Make statistical decision and form conclusion. Because the obtained t ratio falls in a rejection region (i.e., ), Gregory rejects the H 0 of no difference ( ). The difference be-
11 14.7 Interval Estimation of Student's t distribution df = 16 Area =.025 Area =.025 t: 0 t = Region of rejection Region of retention t.05 = t.05 = Region of rejection Figure 14.2 Testing H 0 : against H 1 : (.05). tween the two means is statistically significant (.05). Because the mean recall score for Group 1 (scent-present) is higher than the mean for Group 2 (scent-absent), Gregory draws the substantive conclusion that, under these conditions, scent would appear to improve memory ( ) Interval Estimation of 1 2 The logic underlying interval estimation of 1 2 is basically the same as that for estimation of. The form of the estimate for independent samples is: Rule for a confidence interval for 1 2 (14.7) (X 1 X 2 ) t Formula (14.7) is structurally equivalent to X t s X (Formula 13.4), the confidence interval for presented in Section We re simply replacing X with X 1 X 2 and s X with. As before, t is the tabled value of t for which the middle (1 )(100) percent of the area of Student s distribution is included within the limits t to t. Now, however, df n 1 n 2 2 rather than n 1. Suppose Gregory followed up his test of H 0 : by constructing a 95% confidence interval for 1 2. His question is, What is the range of values within which I am 95% confident 1 2 lies? He already has all the ingredients: X 1 X t
12 294 Chapter 14 Comparing the Means of Two Populations: Independent Samples He now substitutes these in Formula (14.7): 5 (2.12)(2.28) Gregory is 95% confident that 1 2 falls in the interval, That is, he is reasonably confident that the effect of scent on memory in the population the true effect is somewhere between 0.17 (lower limit) and 9.83 (upper limit) additional facts recalled. He doesn t know for sure, of course. But he does know that if his experiment were repeated many times and an interval constructed each time using Formula (14.7), 95% of such intervals would include 1 2. Note that Gregory s 95% confidence interval does not span zero, the value he specified in H 0 and then rejected in a two-tailed test. Whether approached through hypothesis testing or interval estimation, zero (no difference) would not appear to be a reasonable value for 1 2 in this instance. Also note the large width of this confidence interval. This should be expected, given the small values for n 1 and n 2. The true effect of scent on memory could be anywhere between negligible (less than one additional fact recalled) and substantial (almost 10 additional facts). As you saw in Section 12.5, one simply needs larger samples to pin down effects. With a 99% confidence interval, of course, the interval is wider still so wide, in fact, that the interval now spans zero. This confidence interval, which requires t , is: 5 (2.921)(2.28) (lower limit) to (upper limit) Thus, with 99% confidence, Gregory concludes that the true effect of scent on memory is somewhere between (a) small and negative and (b) much larger and positive including the possibility that there is no effect of scent whatsoever. This is shown in Figure = 2 (scent makes no difference) 1 < 2 (scent reduces memory) 1 > 2 (scent improves memory) % confidence interval for Possible values of 1 2 Figure 14.3 The 99% confidence interval for the true difference in mean recall scores in the scent experiment.
13 14.8 How Meaningful is the Difference? Appraising the Magnitude of 295 X 1 X 2 Consistent with this 99% confidence interval, Gregory would have retained H 0 : had he adopted.01 (two-tailed). That is, the sample t ratio is less than t.01, and, thus, zero is a reasonable possibility for 1 2 (at the.01 level of significance) How Meaningful is the Difference? Appraising the Magnitude of X 1 X 2 When you reject the null hypothesis regarding a difference between two means, you are concluding that you have a true difference. But how large is it? Unfortunately, statistically significant frequently is confused with important, substantial, meaningful, or consequential, as you first saw in Section But even a small (and therefore possibly unimportant) difference between two means can result in rejection of H 0 when samples are large. This is because of the effect that sample size has on reducing the standard error,. Recall the location of n 1 and n 2 in the standard error: s 2 pooled 1 n1 1 n 2 When sample sizes are humongous, the term (1/n 1 1/n 2 ) is a very small proportion indeed. Consequently, the product of s 2 pooled and (1/n 1 1/n 2 ) hence is much smaller than when sample sizes are meager (in which case the aforementioned proportion is relatively large). Now, because is the denominator of the t ratio, a smaller standard error for a given mean difference will result in a larger value of t (unless X 1 X 2 is zero). Other things being equal, then, larger samples are more likely to give statistically significant results. Let s consider a quick illustration. Recall from the preceding section that Gregory s obtained t ratio, with df 16, would fail to reach statistical significance at.01. Suppose that we cloned each participant in Gregory s sample so that each score now appears twice: that is, df As Table 14.2 shows, Table 14.2 The Effects of Doubling the Number of Observations: 16 df versus 34 df X 1 X 2 X 1 X 2 s 2 pooled df df t 5/ / t decision retain H 0 reject H 0
14 296 Chapter 14 Comparing the Means of Two Populations: Independent Samples this act of mischief does not change either mean, and X 1 X 2 therefore remains 5. You also see that the pooled variance changes only slightly. But doubling the number of cases reduces by almost one-third: from 2.28 to As a consequence, the sample t ratio increases from 2.19 to With 34 df, the critical t values are (.01), putting the new t ratio comfortably in the region of rejection. The difference between means is now statistically significant at the.01 level, whereas with 16 df the same difference was not. With large enough samples, any obtained difference (other than zero) can be statistically significant. Such an outcome does not imply that the difference is large or important. Rather, it means that the difference in the population probably is not zero. Our advice to you is simple: look at the results carefully. Upon rejecting H 0, note how much difference there is between the two sample means. This is important to do in any case, but particularly when samples are large and even trivial differences can be statistically significant. The magnitude of a difference is not always self-evident. Probably no one will disagree that 40 points is substantial if it represents the difference between two cities in mean summer temperature (40 Fahrenheit), and that this same figure is negligible if it represents the difference between men and women in mean annual income ($40). But unlike temperature and dollars, many variables in educational research lack the familiar meaning necessary to conclude whether, statistical significance aside, a given difference between means is large, small, or somewhere in between. For instance, what is your opinion of the 5-point difference that Gregory obtained? We offer two methods for appraising a difference between means, the first of which you encountered earlier. Effect Size In Sections 5.8 and 6.9, you saw that a difference between two means can be evaluated by expressing it as a proportion of the pooled standard deviation. This is accomplished by calculating an effect size, which is a popular method for reporting the magnitude of a difference between means. Following common practice, we use the symbol d when estimating effect size from sample data: Effect size (sample estimate) d X 1 X 2 SS 1 SS 2 n 1 n 2 2 X 1 X 2 s pooled (14.8)
15 14.8 How Meaningful is the Difference? Appraising the Magnitude of 297 X 1 X 2 The pooled standard deviation in the denominator, s pooled, is simply the square root of s 2 pooled, the familiar pooled variance estimate (Formula 14.4). For Gregory s data, s pooled Thus, d X 1 X 2 s pooled The difference between these two means corresponds to 1.03 standard deviations. In other words, the mean number of facts recalled in Group 1 is roughly one standard deviation higher than the mean in Group 2. A difference of d 1.03 is illustrated in Figure 14.4a, where you see a substantial offset between the two distributions. Indeed, if normal distributions are assumed in the population, it is estimated that the average scent-present subject falls at the 85th percentile of the scent-absent distribution 35 percentile points beyond what would be expected if there were no effect of scent on memory whatsoever. We show this in Figure 14.4b. (You may find it helpful to review Section 6.9 for the logic and calculations underlying Figure 14.4b.) 1.03 standard deviations X 2 = 18 s pooled = 4.85 X 1 = d = = (a) 50% 35% X 2 = 18 X 1 = 23 Figure % (b) Illustrating effect size: d 1.03.
16 298 Chapter 14 Comparing the Means of Two Populations: Independent Samples As you saw in Section 5.8, one convention is to consider d.20 as small, d.50 as moderate, and d.80 as large (Cohen, 1988). In this light, too, the present finding is impressive. Again, however, always take into account the methodological and substantive context of the investigation when making a judgment about effect size. And remember, any sample difference is subject to sampling variation. When sample size is small (as in the present case), d could be appreciably different were the investigation to be repeated. Explained Variance ( ˆ 2 ) A second way to judge the magnitude of Gregory s obtained difference is to estimate the proportion of variation in recall scores that is accounted for, or explained, by variation in group membership (i.e., whether a participant is in Group 1 or Group 2). This follows the general logic of association that we introduced in Chapter 7, where you saw that the proportion of common variance between two variables is determined by r 2 (see Section 7.8). In the case of a difference between two means, one variable group membership is dichotomous rather than continuous. Here, ˆ 2 ( omega squared ) is analogous to r 2 and is calculated from Formula 14.9: omega squared ˆ 2 t 2 1 t 2 n 1 n 2 1 (14.9) 3 For Gregory, t 2.19 and n 1 n 2 9. Now enter these values in Formula 14.9: ˆ 2 t 2 1 t 2 n 1 n Thus, Gregory estimates that 17% of the variation in recall scores is accounted for by variation in group membership. In other words, the experimental manipulation of scent explains 17% of the variance in recall scores. 4 In summary, there are various ways to appraise the magnitude of a difference between two means. Hypothesis testing and interval estimation address the inferential question regarding the corresponding difference in the population, but both can fall short of providing meaningful information about whether the obtained difference is large or important. In contrast, d and ˆ 2 can be quite helpful in this regard. For this reason, we recommend that you consider one or both indices in your own work. 3 The hat ( ) over this term signifies it as an estimate of the population parameter, 2. 4 Where the absolute value of t is less than one, ˆ 2 is negative and therefore meaningless. In such situations, ˆ 2 is set to zero.
17 14.9 How Were Groups Formed? The Role of Randomization How Were Groups Formed? The Role of Randomization Gregory had randomly divided his 18 volunteers into two groups of nine each, perhaps by flipping a coin or using a table of random numbers. The random assignment of participants to treatment conditions is called randomization: Randomization is a method for dividing an available pool of research participants into two or more groups. It refers to any set of procedures that allows chance to determine who is included in what group. Randomization provides two important benefits. The first is a statistical benefit: in a randomized sample, you can apply the rules that govern sampling variation and thus determine the magnitude of difference that is more than can reasonably be attributed to chance. This, as you know, is of vital importance for making statistical inferences. The second benefit of randomization is that it provides experimental control over extraneous factors that can bias the results. Where experimental control is high, a researcher can more confidently attribute an obtained difference to the experimental manipulation. In short, the why of one s results is much clearer when participants have been randomly assigned to treatment conditions. Imagine that Gregory had assigned to the scent-present condition the nine participants who volunteered first, the remainder being assigned to Group 2. The two groups might well differ with regard to, say, their interest in the topic of memory, their eagerness to participate in a research investigation, or an underlying need to be needed. Any one of these factors could affect motivation to perform, which in turn could influence the results Gregory subsequently obtains. The effect of the factor he is studying the presence or absence of scent would be hopelessly confounded with the effects of the uncontrolled, extraneous factors associated with group assignment. In contrast, randomization results in the chance assignment of extraneous influences among the groups to be compared. Eager versus reluctant, able versus less able, interested versus bored, rich versus poor the participants in the various groups will tend to be comparable where randomization has been employed. Indeed, the beauty of randomization is that it affords this type of experimental control over extraneous influences regardless of whether they are known by the researcher to exist. As a result, the investigator can be much more confident that the manipulated factor (e.g., scent) is the only factor that differentiates one group from another and, therefore, the only factor that reasonably explains any group difference subsequently obtained. We emphasize that the random assignment of participants to treatment groups does not guarantee equality with regard to extraneous factors, any more than 50 heads are guaranteed if you toss a coin 100 times. But randomization tends toward equality, particularly as sample size increases.
18 300 Chapter 14 Comparing the Means of Two Populations: Independent Samples When Randomization is not Possible In many instances, randomization is either logically impossible or highly unrealistic. Suppose you wish to compare groups differing on such characteristics as sex, political party, social class, ethnicity, or religious denomination. Or perhaps your interest is in comparing bottle-fed and breast-fed infants (say, on a measure of maternal attachment). You certainly cannot randomly assign participants to the treatment condition male or female, Democrat or Republican, and so on. And it is highly unlikely that mothers will agree to be randomly assigned to a particular method of feeding their newborns. Rather, you take individuals as they are. In such cases, you lose a considerable degree of control of extraneous factors, and determining the why of the results is not easy. This is because when groups are formed in this fashion, they necessarily bring along other characteristics as well. For example, males and females differ in physiology and socialization; Democrats and Republicans differ in political ideology and economic/demographic characteristics; and the two groups of mothers likely differ in beliefs and behaviors regarding motherhood beyond the decision to breast-feed or bottle-feed their children. In each instance, then, the announced comparison is confounded with uncontrolled, extraneous factors. Field-based educational research almost invariably involves already formed groups, as can be found in achievement comparisons of schools that have adopted different instructional programs. Randomization is typically unfeasible in such research, and the investigator must be sensitive to extraneous influences here too. For example, perhaps the schools that adopted an innovative curriculum also have more talented teachers, higher levels of parent involvement, or more students from higher socioeconomic backgrounds. Any one of these factors can influence achievement beyond any effect that the innovative curriculum may have. Separating out the relative influence of confounding factors requires great care, and when it can be done, procedures are required that go beyond those offered in an introductory course in statistics. None of this is to say that only studies that permit randomization should be conducted. Quite the contrary, for such a restriction would rule out the investigation of many important and interesting research questions. Nevertheless, in the absence of randomization, one must use considerable care in the design, analysis, and interpretation of such studies Statistical Inferences and Nonstatistical Generalizations Most statistical inference procedures, including those covered in this text, are based on the random sampling model described in Section That is, they assume that the sample observations have been randomly selected from the population of interest. If the sample has been selected in this way, the procedures permit inferences about characteristics (such as means) of the defined population. These inferences are statistical inferences, based directly on the laws of probability and statistics, and their function is to take chance sampling variation into account.
19 14.11 Summary 301 The investigator, however, usually wishes to generalize beyond the original population that was sampled. Thus, when you have at hand the results of a particular investigation performed at a particular time under particular conditions using participants of a particular type who were selected in a particular way, you attempt to apply the outcome more broadly. As we showed in Section 10.5, this involves a close analysis of the participants and conditions of the investigation, and a reasoned argument regarding the characteristics of the accessible population and broader populations to which your results may apply. Generalizations of this sort are nonstatistical in nature; insofar as they involve judgment and interpretation, they go beyond what statistics can show. What are the implications of this for educational research? In effect, although statistical inference procedures can account for chance sampling variation in the sample results, they do not provide any mathematically based way of generalizing from, or making inferences beyond, the type of participants used and the exact set of conditions at the time. This does not mean that broader generalizations cannot properly be made; indeed, they should be made. It means only that statistics does not provide a sufficient basis for making them. This type of generalization must also be based on knowledge and understanding of the substantive area, as well as on judgment of the similarity of new circumstances to those that characterized the original study. Statistical inference is a necessary first step toward the broader generalization Summary This chapter is concerned with examining a difference between the means of two independent groups. Two groups are independent if none of the observations in one group is related in any way to observations in the other group. The general logic and procedure of a two-sample test are quite similar to those characterizing tests of hypotheses about single means: the investigator formulates the statistical hypotheses, sets the level of significance, collects data, calculates the test statistic, compares it to the critical value(s), and then makes a decision about the null hypothesis. The test statistic t is given in the formula, t X 1 X 2 /. An important assumption is that the population distributions are normal in shape. But because of the central limit theorem, the sampling distribution of differences between means can be considered to be reasonably close to a normal distribution except when samples are small and the two distributions are substantially nonnormal. An additional assumption is that the population variances are equal ( ), which leads to the calculation of a pooled variance estimate ( ) used for calculating s 2 pooled the standard error ( ). Once calculated, the t ratio is evaluated by reference to Student s t distribution, using df n 1 n 2 2. A (1 )(100) percent confidence interval for 1 2 can be estimated with the rule, X 1 X 2 t. When a statistically significant difference between two means is found, you should ask whether it is large enough to be important. The simplest way to do this is to examine the size of the difference between X 1 and X 2, although effect size (d) and variance explained ( ˆ 2 ) often are more helpful in this regard. Effect size expresses a difference in terms of (pooled) standard deviation units, whereas ˆ 2 estimates the proportion of variance in the dependent variable that is explained by group membership. Randomization is a procedure whereby an available group of research participants is randomly assigned to two or more treatment conditions. Randomization not only furnishes a statistical basis for evaluating obtained sample differences but also provides an effective means of controlling factors extraneous to the study. Such controls make interpretation
20 302 Chapter 14 Comparing the Means of Two Populations: Independent Samples of significant differences between means considerably easier than when the groups are already formed on the basis of some characteristic of the participants (e.g., sex, ethnicity). The assumption of random sampling underlies nearly all the statistical inference techniques used by educational researchers, including the t test and other procedures described in this book. Inferences to populations from which the samples have been randomly selected are directly backed by the laws of probability and statistics and are known as statistical inferences; inferences or generalizations to all other groups are nonstatistical in nature and involve judgment and interpretation. Reading the Research: Independent-Samples t Test Santa and Hoien (1999, p. 65) examined the effects of an early intervention program on a sample of students at risk for reading failure: A t-test analysis showed that the post-intervention spelling performance in the experimental group (M 59.6, SD 5.95) was statistically significantly higher than in the control group (M 53.7, SD 12.4), t(47) 2.067, p.05. Notice that an exact p value is not reported; rather, probability is reported relative to the significance level of.05. The result of this independent samples t test is therefore deemed significant at the.05 level. Source: Santa, C. M, & Hoien, T. (1999). An assessment of early steps: A program for early intervention of reading problems. Reading Research Quarterly, 34(1), Case Study: Doing Our Homework This case study demonstrates the application of the independent-samples t test. We compared the academic achievement of students who, on average, spend two hours a day on homework to students who spend about half that amount of time on homework. Does that extra hour of homework in this case, double the time translate into a corresponding difference in achievement? The sample of nearly 500 students was randomly selected from a population of seniors enrolled in public schools located in the northeastern United States. (The data are courtesy of the National Center for Education Statistics National Education Longitudinal Study of 1988.) We compared two groups of students: those reporting 4 6 hours of homework per week (Group 1) and those reporting hours per week (Group 2). The criterion measures were reading achievement, mathematics achievement, and grade-point average. One could reasonably expect that students who did more homework would score higher on measures of academic performance. We therefore chose the directional alternative hypothesis, H 1 : 1 2 0, for each of the three t tests below. (The less than symbol simply reflects the fact that we are subtracting the hypothetically larger mean from the smaller mean.) For all three tests, the null hypothesis stated no difference, H 0 : The level of significance was set at.05.
21 Case Study: Doing Our Homework 303 Table 14.3 Statistics for Reading, Mathematics, and GPA n X s s X READ 1 hour hour MATH 1 hour hour GPA 1 hour hour Our first test examined the mean difference between the two groups in reading performance. Scores on the reading exam are represented by T scores, which, you may recall from Chapter 6, have a mean of 50 and a standard deviation of 10. (Remember not to confuse T scores, which are standard scores, with t ratios, which make up the t distribution and are used for significance testing.) The mean scores are shown in Table As expected, the mean reading achievement of Group 2 ( X ) exceeded that of Group 1 ( X ). An independent-samples t test revealed that this mean difference was statistically significant at the.05 level (see Table 14.4). Because large sample sizes can produce statistical significance for small (and possibly trivial) differences, we also determined the effect size in order to capture the magnitude of this mean difference. From Table 14.4, we see that the raw mean difference of 1.93 points corresponds to an effect size of.21. Remember, we are subtracting X 2 from X 1 (hence the negative signs). This effect size indicates that the mean reading achievement of Group 1 students was roughly one-fifth of a standard deviation below that of Group 2 students a rather small effect. We obtained similar results on the mathematics measure. The difference again was statistically significant in this case, satisfying the more stringent.001 significance level. The effect size, d.31, suggests that the difference between the two groups in mathematics performance is roughly one-third of a standard deviation. (It is tempting to conclude that the mathematics difference is larger than the reading difference, but this would require an additional analysis testing the statistical Table 14.4 Independent-Samples t Tests and Effect Sizes X 1 X 2 t df p(one-tailed) d READ MATH GPA
22 304 Chapter 14 Comparing the Means of Two Populations: Independent Samples significance of the difference between two differences. We have not done that here.) Finally, the mean difference in GPA was X 1 X 2.08, with a corresponding effect size of.14. This difference was not statistically significant (p.059). Even if it were, its magnitude is rather small (d.14) and arguably of little practical significance. Nevertheless, the obtained p value of.059 raises an important point. Although, strictly speaking, this p value failed to meet the.05 criterion, it is important to remember that.05 (or any other value) is entirely arbitrary. Should this result, p.059, be declared statistically significant? Absolutely not. But nor should it be dismissed entirely. When a p value is tantalizingly close to but nonetheless fails to meet this criterion, researchers sometimes use the term marginally significant. Although no convention exists (that we know of) for deciding between a marginally significant result and one that is patently nonsignificant, we believe that it is important to not categorically dismiss results that, though exceeding the announced level of significance, nonetheless are highly improbable. (In the present case, for example, the decision to retain the null hypothesis rests on the difference in probability between 50/1000 and 59/1000.) This also is a good reason for reporting exact p values in one s research: it allows readers to make their own judgments regarding statistical significance. By considering the exact probability in conjunction with effect size, readers draw a more informed conclusion about the importance of the reported result. Suggested Computer Exercises 1. Access the students data set, which contains grade-point averages (GPA) and television viewing information (TVHRSWK) for a random sample of 75 tenth-grade students. Test whether there is a statistically significant difference in GPA between students who watch less than two hours of television per weekday and those who watch two or more hours of television. In doing so, (a) (b) (c) set up the appropriate statistical hypotheses, perform the test (.05), and draw final conclusions. 2. Repeat the process above, but instead of GPA as the dependent variable, use performance on the reading and mathematics exams. Exercises Identify, Define, or Explain Terms and Concepts independent samples dependent samples sampling distribution of differences between means standard error of the difference between means population variance assumption of homogeneity of variance variance estimate pooled variance estimate assumption of population normality interval estimation of 1 2
23 Exercises 305 sample size and statistical significance effect size explained variance randomization experimental control statistical inferences vs. nonstatistical generalizations Symbols X 1 and n 1 and n 2 and X X 1 X 2 X 1 X 2 s2 pooled t df d ˆ 2 Questions and Problems Note: Answers to starred (*) items are presented in Appendix B. *1. Translate each of the following into words, and then express each in symbols in terms of a difference between means relative to zero: (a) A B (b) (c) (d) A B A B A B 2. A graduate student wishes to compare the high school grade-point averages (GPAs) of males and females. He identifies 50 brother/sister pairs, obtains the GPA for each individual, and proceeds to test H 0 : males females 0. Are the methods discussed in this chapter appropriate for such a test? (Explain.) *3. Consider two large populations of observations, A and B. Suppose you have unlimited time and resources. (a) Describe how, through a series of sampling experiments, you could construct a fairly accurate picture of the sampling distribution of X A X B for samples of size n A 5 and n B 5. (b) Describe how the results used to construct the sampling distribution could be used to obtain an estimate of. X A X B 4. Assume H 0 : is true. What are the three defining characteristics of the sampling distribution of differences between means? *5. The following results are for two samples, one from Population 1 and the other from Population 2: from Population 1: 3, 5, 7, 5 from Population 2: 8, 9, 6, 5, 12 (a) Compute SS 1 and SS 2. (b) Using the results from Problem 5a, compute the pooled variance estimate. (c) Using the result from Problem 5b, obtain. (d) Test H 0 : against H 1 : (.05). (e) Draw final conclusions.
24 306 Chapter 14 Comparing the Means of Two Populations: Independent Samples *6. From the data given in Problem 5: (a) Compute and interpret the effect size, d; evaluate its magnitude in terms of Cohen s criteria and in terms of the normal curve. (b) Calculate and interpret ˆ 2. *7. For each of the following cases, give the critical value(s) of t: (a) H 1 : 1 2 0, n 1 6, n 2 12,.05 (b) H 1 : 1 2 0, n 1 12, n 2 14,.01 (c) H 1 : 1 2 0, n 1 14, n 2 16,.05 (d) H 1 : 1 2 0, n 1 19, n 2 18, Does familiarity with an assessment increase test scores? You hypothesize that it does. You identify 11 fifth-grade students to take a writing assessment that they had not experienced before. Six of these students are selected at random and, before taking the assessment, are provided with a general overview of its rationale, length, question format, and so on. The remaining five students are not given this overview. The following are the scores (number of points) for students in each group: overview provided: 20, 18, 14, 22, 16, 16 no overview provided: 11, 15, 16, 13, 9 (a) Set up H 0 and H 1. (b) Perform the test (.01). (c) Draw your final conclusions. 9. An educational psychologist is interested in knowing whether the experience of attending preschool is related to subsequent sociability. She identifies two groups of first graders: those who had attended preschool and those who had not. Then each child is assigned a sociability score on the basis of observations made in the classroom and on the playground. The following sociability results are obtained: Attended preschool Did not attend preschool n 1 12, X 1 204, SS n 2 16, X 2 248, SS (a) (b) (c) Set up the appropriate statistical hypotheses. Perform the test (.05). Draw final conclusions. *10. You are investigating the possible differences between eighth-grade boys and girls regarding their perceptions of the usefulness and relevance of science for the roles they see themselves assuming as adults. Your research hypothesis is that boys hold more positive perceptions in this regard. Using an appropriate instrument, you obtain the following results (higher scores reflect more positive perceptions): Male Female n 1 26, X , s n 2 24, X , s 2 9.7
25 Exercises 307 (a) (b) (c) Set up the appropriate statistical hypotheses. Perform the test (.05). Draw final conclusions. 11. From the data given in Problem 10: (a) Compute and interpret the effect size, d; evaluate its magnitude in terms of Cohen s criteria and in terms of the normal curve. (b) Calculate and interpret ˆ Parametric statistical tests are tests that are based on one or more assumptions about the nature of the populations from which the samples are selected. What assumptions are required in the t test of H 0 : 1 2 0? *13. You read the following in a popular magazine: A group of college women scored significantly higher, on average, than a group of college men on a test of emotional intelligence. (Limit your answers to statistical matters covered in this chapter.) (a) (b) (c) (d) How is the statistically unsophisticated person likely to interpret this statement (particularly the italicized phrase)? What does this statement really mean? Is it possible that the difference between the average woman and the average man was in fact quite small? If so, how could a significant difference be observed? What additional statistical information would you want in order to evaluate the actual difference between these women and men? *14. A high school social studies teacher decides to conduct action research in her classroom by investigating the effects of immediate testing on memory. She randomly divides her class into two groups. Group 1 studies a short essay for 20 minutes, whereas Group 2 studies the essay for 20 minutes and immediately following takes a 10-minute test on the essay. The results below are from a final exam on the essay, taken one month later: Group 1 (studied only) Group 2 (studied and tested) n 1 15, X 1 300, SS n 2 15, X 2 330, SS (a) (b) (c) Set up the appropriate statistical hypotheses. Perform the test (.05). Draw final conclusions. *15. (a) Suppose you constructed a 95% confidence interval for 1 2, given the data in Problem 14. What one value do you already know will reside in that interval? (Explain.) (b) Now construct a 95% confidence interval for 1 2, given the data in Problem 14. Any surprises? (c) Without performing any calculations, comment on whether a 99% confidence interval estimated from the same data would include zero. (d) Now construct a 99% confidence interval for 1 2, given the data in Problem 14. Any surprises?
26 308 Chapter 14 Comparing the Means of Two Populations: Independent Samples 16. The director of Academic Support Services wants to test the efficacy of a possible intervention for undergraduate students who are placed on academic probation. She randomly assigns 28 such students to two groups. During the first week of the semester, students in Group 1 receive daily instruction on specific strategies for learning and studying. Group 2 students spend the same time engaged in general discussion about the importance of doing well in college and the support services that are available on campus. At the end of the semester, the director determines the mean GPA for each group: Group 1 (strategy instruction) Group 2 (general discussion) n 1 14, 2.83, s 1.41 n 2 14, 2.26, s 2.39 X 1 X 2 (a) (b) (c) Set up the appropriate statistical hypotheses. Perform the test (.05). Draw final conclusions. 17. From the data given in Problem 16: (a) Compute and interpret the effect size, d; evaluate its magnitude in terms of Cohen s criteria and in terms of the normal curve. (b) Calculate and interpret ˆ Compare the investigation described in Problem 9 with that in Problem 14. Suppose a significant difference had been found in both in favor of the children who attended preschool in Problem 9 and in favor of Group 2 in Problem 14. (a) (b) For which investigation would it be easier to clarify the relationship between cause and effect? (Explain.) What are some other possible explanations other than whether a child attended preschool for a significant difference in sociability in Problem 9? *19. Examine Problems 8, 9, 10, 14, and 16. In which would it be easiest to clarify causal relationships? (Explain.) 20. Is randomization the same as random sampling? (Explain.) 21. Suppose the following statement were made on the basis of the significant difference reported in Problem 13: Statistics show that women are higher in emotional intelligence than men. (a) Is the statement a statistical or nonstatistical inference? (Explain.) (b) Describe some of the limits to any statistical inferences based on the study.
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
Study Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
Non-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM [email protected] http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
In the past, the increase in the price of gasoline could be attributed to major national or global
Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Using Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
Two-sample inference: Continuous data
Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation
Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
Sample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
Week 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
Statistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
Introduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)
UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design
research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
Testing Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Week 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)
individualdifferences
1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,
Unit 26 Estimation with Confidence Intervals
Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference
Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Independent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test
Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric
AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
Basic Concepts in Research and Data Analysis
Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the
Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or
1 Chapter 7 Comparing Means in SPSS (t-tests) This section covers procedures for testing the differences between two means using the SPSS Compare Means analyses. Specifically, we demonstrate procedures
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
Statistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses
Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the
Name: Date: Use the following to answer questions 3-4:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
This chapter discusses some of the basic concepts in inferential statistics.
Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
One-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
Hypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish
Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)
DDBA 8438: The t Test for Independent Samples Video Podcast Transcript
DDBA 8438: The t Test for Independent Samples Video Podcast Transcript JENNIFER ANN MORROW: Welcome to The t Test for Independent Samples. My name is Dr. Jennifer Ann Morrow. In today's demonstration,
MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010
MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times
1.6 The Order of Operations
1.6 The Order of Operations Contents: Operations Grouping Symbols The Order of Operations Exponents and Negative Numbers Negative Square Roots Square Root of a Negative Number Order of Operations and Negative
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
Chi-square test Fisher s Exact test
Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.
Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample
Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
The Null Hypothesis. Geoffrey R. Loftus University of Washington
The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 [email protected]
WHAT IS A JOURNAL CLUB?
WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club
Chapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
Chapter 6: Probability
Chapter 6: Probability In a more mathematically oriented statistics course, you would spend a lot of time talking about colored balls in urns. We will skip over such detailed examinations of probability,
statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals
Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Two Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com
Inference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
Statistical estimation using confidence intervals
0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena
Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!
Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Name: 1. The basic idea behind hypothesis testing: A. is important only if you want to compare two populations. B. depends on
CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Two Related Samples t Test
Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person
Topic 8. Chi Square Tests
BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test
Comparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
Point and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures
Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:
MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS
MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS Instructor: Mark Schilling Email: [email protected] (Note: If your CSUN email address is not one you use regularly, be sure to set up automatic
Selecting Research Participants
C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
People have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month
