AMS 5 TESTS FOR TWO SAMPLES

Test for difference We will consider the problem of comparing the means of two populations. The main tool to do hypothesis testing in this case is the z test for the difference of two populations. The z test is based on the standardized difference between the averages of the two groups. We will consider the following topics: How to calculate the standard error for the difference How to compare the two averages The case of binary populations Two tailed versus one tailed tests

Test for difference There are many situations where the population is split in two groups and we want to test if there are significant differences between the groups. As we have seen so far the key issue about hypothesis testing is that we need a standardized measure of the distance between what we observe and what we postulate in the null hypothesis. In order to get this we need an estimate of the standard error. Consider two boxes Suppose 400 draws are made at random with replacement from box A and 100 from box B.

Test for difference We need to estimate and test the difference between the two samples. Given the numbers in the boxes we have: BOX A: BOX B: Average of 400 draws = Average of 400 draws = 110± 60 / 400 = 110± 3 SE= 3 90± 40 / 100 = 90± 4 SE= 4 For the difference we have an expected value of 110-90 = 20 What is the corresponding standard error?

Test for difference 2 2 In the previous example we get 3 + 4 = 5 and so we expect the difference to be 20 ± 5.

Independent? For the square root formula to apply we need independence. Example 1: 100 draws are made with replacement from boxes Find the expected value and SE for the difference between the number of 1 s drawn from C and 4 s drawn from D. We expect to have 50 ± 5 1 s and 50 ± 5 4 s. The expected value for the difference is therefore 50 50 = 0. The draws are made independently and therefore the SE for the difference is 2 2 5 + 5 7.

Independent? Example 2: 100 draws are made with replacement from the box The expected number of 1's is 20 with a SE of 4. The expected number of 5's is also 20 with a SE of 4. The expected value of the difference between the number of 1's 2 2 and the number of 5's is 0, but the SE is NOT 4 + 4 since there is not independence (if one number is large the other is likely to be small).

Test statistic A nationwide sample of 1000 17-year-old students was given a math test in 1978. This was repeated in 1992. The average score went from 300.4 the first time, to 306.7 the second time. The difference is 6.3 points. Is this significant? Using hypothesis testing terminology we set the null hypothesis as H 0 : the difference is 0 and the alternative hypothesis as H 1 : the average of 1992 is bigger than the that of 1978 The averages were obtained from samples of 1,000 results from each math test. The corresponding SEs are 1.1 for 1978 and 1.0 for 1992. We can compute the SE for the difference as 2 2 1.0 + 1.1 1.5 since we can assume that the samples are independent.

Test statistic Then we can obtain a z-test as : z = This is equal to observed difference - expected difference under H 0 SE for difference 6.3-0 z= 4.2 1.5 Since the area under the normal curve corresponding to values above 4.2 is negligible we reject the null hypothesis and conclude that the difference is significantly large. Thus, students performed better in 1992 than in 1978.

Example 1. The students of the class were divided in two groups: the pink and the lavender groups. Each person counted the amount of cents in their pockets. The results are 94.46 cents average for the pink group with a SE of 28.99 cents and 79.63 cents average for the lavender group, with a SE of 38.81. Is there enough evidence to support the claim that the students in the pink group had, on average, more change that the students in the lavender group? 2 2 The standard error of the difference is 28.99 + 38.81 = 48.44. Thus the test statistics is 94.46-79.63 z= = 0.31. 48.44 The probability that a standard normal will have values above 0.31 is about 38%. Since this is a large P-value, there are no reasons to reject the hypothesis that there are no differences between the groups.

Example 2. A safety engineer compares the braking distance of two sets of tires by performing 50 braking tests for each set. The results are: set 1 has an average braking distance of 42 feet with a SD of 4.7 feet. Set 2 has an average braking distance of 55 feet with a SD of 5.3 feet. Is there enough evidence to support the claim that the second set of tires has a larger mean braking distance than the first? The standard error of the difference is Thus the test statistics is 55-42 z= = 1.83. 7.08 2 2 4.7 + 5.3 = 7.08. The probability that a standard normal will have values above 1.83 is about 3%. Since this is a small P-value, there are reasons to reject the null hypothesis that the two sets have the same mean braking distance.

Binary boxes In a sample of 200 male students 107 use a personal computer on a regular basis. In a sample of 300 female students 132 use personal computers on a regular basis. Is the difference between the two groups real or due to chance variation? We can think of two binary boxes: one for the males and one for the females. The boxes have 1's for those who use the PCs and 0 for those who don't. The percent in the males box is 53.5% and in the females box is 44.0%. According to the null hypothesis the percentage of 1's in both boxes is the same. The standard error for the percentage of PC users is approximated as 0.535 (1 0.535) 200 = 3.5% for the group of male students 0.44 (1 0.44) 300 = 2.9% for the group of female students

These SEs are obtained by approximating the SD of the box using the SD for the sample. The SE for the difference is Thus the test statistics is Binary boxes 2 2 3.5 + 2.9 4.5%. (53.5% - 44.0%) 0 z= 2.1 4.5% and this value corresponds to a P-value of approximately 2%. So we reject the null hypothesis and conclude that the difference is real.

Experiments Suppose a clinical trial to test the effectiveness of vitamin C is conducted. 200 subjects participate in the trial. Half of them are randomized to get 200mg of vitamin C and the other half gets 200mg of a placebo. The results are that, over the period of the trial, the treatment group averages 2.3 colds, with a SD of 3.1, and the control group averages 2.6 colds, with a SD of 2.9. Is the difference significant? The difference is -0.3 and the SEs are obtained as 3.1 2.9 = 0.31 and = 0.29 100 100 The standard error of the difference is Thus the z-test is -0.3 0.0 z= 0.7. 0.42 2 2 0.31 + 0.29 0.42.

Experiments This z-value corresponds to a P-value of around 24%. Thus, the difference is not significant. Is this answer O.K.? If we think of a box model we realize that this solution involves two mistakes: The draws are taken without replacement but the SEs are computed as if that were taken with replacement. The two averages are not independent, but the SEs are combined as if they were. The consequences of these mistakes are not relevant if the number of draws is small relative to the population. But this is seldom the case for clinical trials. We have that The first mistake inflates the SE. The second mistake deflates the SE. So the two mistakes compensate and usually the result is a small overestimation of the SE.

Two versus one tailed tests You want to see if a coin is fair. The coin is tossed 100 times and lands heads on 61 of the tosses. The null hypothesis consists on assuming that the coin is fair so, under the null, we have an expected value of 50 heads. The SE is 100 0.5= 5, thus the test statistics is 61 50 z= = 2.2. 5 Consider testing against the alternative hypothesis that the coin is biased towards heads, that is, that probability of heads is bigger than 1/2. Therefore, big values of z favor the alternative hypothesis. The P-value under this hypothesis is the area under the normal curve corresponding to the number greater than 2.2. This is equal to 1.4%.

Two versus one tailed tests Suppose that we consider a different alternative hypothesis, consisting on the probability of heads being different, in either direction, than 1/2. The values of z that favor the alternative hypothesis are either large negative or positive values. The P- value is obtained by the area under the normal curve corresponding to values less than -2.2 or greater than 2.2. The P-value in this case is 2.8%. The first test corresponds to a one tailed test. The second test corresponds to a two tailed test. Two tailed tests have a P-value that is the double of one tailed tests. otherwise you could be manipulating the results.

Example One hundred draws are made at random with replacement from box A and 250 are made from box B. The boxes contain numbered tickets. The numbers can be positive or negative. 1. 50 of the draws from A are positive. 131 of the draws from box B are positive. Is the difference real or due to chance? The proportion of positive tickets in box A is 50%, in box B it is 52.4%. The SEs for the boxes are given by: 0.5 0.5 0.51 0.49 = 0.05 and = 0.032 100 250 The standard error of the difference is Thus the z-test is (52.4 50) 0 z= 0.4. 5.9 2 2 0.05 + 0.032 = 0.059.

Example The probability that a standard normal will be above 0.4 is about 34%. So for either a one or two tailed test we conclude that the difference is likely due to chance. 2. The draws from box A average 1.4 with and SD of 15.3. the draws from box B average 6.3 with and SD of 16.1. Is the difference between the averages statistically significant? We can obtain the SE as 15.3 16.1 = 1.53 and = 1.02 100 250 The standard error of the difference is Thus the z-test is (6.3 1.4) 0 z= = 2.68. 1.83 2 2 1.53 + 1.02 = 1.83.

Example The probability that a standard normal will be above 2.68 is about 0.004. So even for a two tailed test we have that the P-value is very small. The conclusion is that the difference between the two boxes seems real.