Hypothesis Testing for a Proportion Example: We are interested in the probability of developing asthma over a given one-year period for children 0 to 4 years of age whose mothers smoke in the home In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4% We assume that cigarette smoke in the home cannot possibly reduce the incidence of asthma, therefore we conduct a one-sided test H 0 : p = p 0 =0.014 H A : p>0.014 We will conduct the test at the 0.05 level of significance 1
If 10 cases of asthma are observed over a single year in a sample of 500 children whose mothers smoke, is this compatible with the null hypothesis? H 0 would be rejected if the sample proportion ˆp = x/n is too big One method of hypothesis testing relies on the normal approximation to the binomial distribution (central limit theorem) This approximation is reasonable if np 0 q 0 5 Under H 0, ˆp N ( p 0, p 0q 0 n ) 2
Therefore, z = ˆp p 0 p 0 q 0 /n has a standard normal distribution This is the test statistic To conduct a one-sided test at the α level of significance, H 0 is rejected if z>z 1 α If α =0.05, then H 0 would be rejected for z>1.645 For the asthma data, note that np 0 q 0 = 500(0.014)(0.986) = 6.9 3
Using the normal approximation, ˆp = 10 500 = 0.02 and z = 0.02 0.014 (0.014)(0.986)/500 = 1.14 Since 1.14 < 1.645, we do not reject H 0 We do not have sufficient evidence to conclude that the probability of developing asthma for children whose mothers smoke in the home is different from the probability in the general population This is the critical value method of hypothesis testing 4
The p value method could also be used The p value is the probability of obtaining a sample proportion as extreme or more extreme than the observed proportion ˆp (ˆp =0.02), given that H 0 is true (p =0.014) The area under the standard normal curve to the right of 1.14 is 0.1271 Therefore, p =0.1271 Since p>0.05, we again fail to reject H 0 An exact method of hypothesis testing uses the binomial distribution itself, rather than the normal approximation 5
If ˆp p 0, then the p value for a one-sided test is p = P( x out of n H 0 ) = x k=0 ( n k ) p k 0 q n k 0 If ˆp >p 0, then the p value is p = P( x out of n H 0 ) = n k=x ( n k ) p k 0 q n k 0 For the asthma example, ˆp =0.02 is greater than p 0 =0.014 6
Therefore, p = P( 10 out of 500 p =0.014) = 1 P(< 10 p =0.014) = 1 9 k=0 ( 500 k ) (0.014) k (0.986) 500 k = 0.1681. bitesti 500 10.014 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 500 10 7 0.01400 0.02000 Pr(k >= 10) = 0.168070 (one-sided test) Pr(k <= 10) = 0.902981 (one-sided test) Pr(k <= 3 or k >= 10) = 0.248373 (two-sided test) Again we cannot reject H 0 : p =0.014 at the 0.05 level 7
We could also explore this question using a confidence interval Note that n ˆp ˆq = 500(0.02)(0.98) = 9.8 Using the normal approximation, a 95% confidence interval for the proportion of children developing asthma among those whose mothers smoke in the home is (.02 1.96 (.02)(.98) 500,.02+1.96 ) (.02)(.98) 500 or (0.008, 0.032) Since the value 0.014 lies in this confidence interval, it is a plausible value for the population proportion 8
Note: When making inference about a population proportion, the hypothesis test and the confidence interval are not mathematically equivalent Using the normal approximation, a 95% confidence interval for a proportion p is ˆp ˆq ˆp ˆq ˆp 1.96, ˆp 1.96 n n The test statistic of the corresponding hypothesis test is z = ˆp p 0 p 0 q 0 /n Note that the standard errors are not the same 9
We could also use the exact binomial confidence interval. cii 500 10, binomial -- Binomial Exact -- Variable Obs Mean Std. Err. [95% Conf. Interval] ---------+------------------------------------------------ 500.02.006261.0096314.0364724 Again the value 0.014 lies within the interval We do not have sufficient evidence to conclude that the probability of developing asthma for children whose mothers smoke in the home is different from the probability in the general population 10
Example: We are interested in studying the cognitive abilities of children weighing less than 1500 grams at birth who experience perinatal growth failure, a condition preventing proper development In the general population of children exhibiting normal growth in the perinatal period, 3.2% have an IQ score below 70 when they reach the age of 8 years Is this also true for children who experience perinatal growth failure? H 0 : p =0.032 H A : p 0.032 11
We wish to conduct the two-sided test at the 0.05 level of significance A random sample of 33 children with perinatal growth failure is selected At the age of 8 years, 8 of the children have an IQ score below 70 The sample proportion is ˆp = 8 33 = 0.242 Note that np 0 q 0 = 33(0.032)(0.968) = 1.0 Therefore, the normal approximation to the binomial distribution cannot be applied 12
We must use the exact binomial test. bitesti 33 8.032 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 33 8 1.056 0.03200 0.24242 Pr(k >= 8) = 0.000007 Pr(k <= 8) = 0.999999 Pr(k >= 8) = 0.000007 (one-sided test) (one-sided test) (two-sided test) Since p =0.000007, which is much less than α =0.05, we reject the null hypothesis For children who suffer from perinatal growth failure, the proportion who have an IQ score below 70 at the age of 8 years is not equal to 0.032 It is in fact higher 13