Math 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015

Name: Exam Score: Instructions: This exam covers the material from chapter 7 through 9. Please read each question carefully before you attempt to solve it. Remember that you have to show all of your work clearly if you want to get credit. Also, unless otherwise stated, feel free to use a significance level α = 0.05. The exam is closed book. Good luck! Problem 1: In a hotly contested U.S. election, two candidates for president, a Democrat and a Republican, are running neck and neck; each candidate has 50% of the vote. Suppose a random sample of 1000 voters are asked whether they will vote for the Republican candidate. What percentage of the sample should be expected to express support for the Republican? What is the standard error for this sample proportion? Does the Central Limit Theorem apply? If so, what is the approximate probability that the sample proportion will fall within two standard errors of the population value of p = 0.50? We took a random sample, so the expected value of our estimator ˆp is equal to the true population proportion p. In other words, there is no bias in the estimation procedure. Therefore, we expect that 50% of our sample supports the Republican candidate. The standard error for the sample proportion can be obtained as follows SE= p(1 p) n = 0.5(0.5) 1000 = 0.0158 Simply put, we expect our sample proportion to be 50%, give or take 1.58 percentage points. To check if the CLT applies we verify that nˆp 10, n(1 ˆp) 10, and N 10n; the first two quantities equal 500 and N (all US voters) is certainly greater than 10n. Hence, the CLT does apply. The CLT tells us that we can use the Normal Distribution, N(0.50, 0.0158). To calculate the probability that the sample proportion will fall within two standard errors of the population value of p = 0.50, you can use the computer or the empirical rule. The empirical rule states that the probability we are looking for is approximately 0.95. Page 1 of 8

Problem 2: The Excel file named Tee-Times has data on a national survey of 900 women golfers. The survey was conducted to learn how women golfers view their treatment at golf courses in the United States. The survey found that 396 of the women golfers were satisfied with the availability of Tee-Times. Estimate the proportion of the population of women golfers who are satisfied with the availability of Tee-Times. Find the margin of error and a 95% confidence interval estimate of the population proportion. INTERPRET your results within the context of the given problem. The sample proportion is ˆp = 396 = 0.44. This is the point estimate for our population 900 proportion. The margin of error (z SE) you should have calculated is z SE = 1.96 0.44(1 0.44) 900 = 0.0324 The 95% confidence interval estimate for the population proportion is 95% CI = 0.44 ± 0.0324 = (0.4076, 0.4724). The survey results enable us to state with 95% confidence that between 40.76% and 47.24% of all women golfers in the nation are satisfied with the availability of Tee-Times. Page 2 of 8

Problem 3: A polling agency is deciding how many voters to poll. The agency wants to estimate the percentage of voters in favor of extending tax cuts, and it wants to provide a margin of error of no more than 1.8 percentage points. a. Using 95% confidence, how many respondents must the agency poll? b. If the margin of error is to be no more than 1.7%, with 95% confidence, should the sample be larger or smaller than that determined in part a? Explain your reasoning. a. To solve this problem you simply needed to use the formula n = 1, where m is the m 2 margin of error. We are given that m = 0.018, so the sample size n = 1 = 3086 (0.018) 2 is the sample size we need for the given conditions. b. We use the same formula as in part a. However, this time just use m = 0.017 as the margin of error. The sample size that we need is 3460, which is a larger sample than that of part a. Page 3 of 8

Problem 4: Over the past year, 20% of the players at Pine Creek were women. In an effort to increase the proportion of women players, Pine Creek implemented a special promotion designed to attract women golfers. One month after the promotion was implemented, the course manager requested a statistical study to determine whether the proportion of women players at Pine Creek had increased. You are the statistician hired to investigate the situation. Using the data in the Excel file named Pine Creek and assuming that the data comes from an approximately normal distribution, carryout the following procedures. a. What type of tail test is required, upper tail test, lower tail test, or two-tail test? Explain your answer. b. State and interpret the appropriate null and alternative hypothesis needed for the Pine Creek hypothesis test that is needed. c. Specify the level of significance that you see fit to carryout the hypothesis test in part a. What does your analysis (test) reveal about Pine Creek s promotion program? That is, what news will you give the Pine Creek manager? Justify your answer with statistical procedures (i.e. hypothesis test). a. The objective of the study is to determine whether the proportion of women golfers increased. Hence, an upper tail test is appropriate (H alternative : p > 20). b. The null and alternative hypotheses for the Pine Creek hypothesis test are as follows: H 0 : p 0.20 H a : p > 0.20 c. We can use a level of significance of α = 0.05 in this case. We know that the distribution is approximately normal, so you could have used the critical value (critical z) approach or calculated a p-value. For an observed z = 2.50, we get a p-value = 0.0062; use ˆp = 0.25 and p = 0.20 to calculate z. Since p-value = 0.0062 < α = 0.05, we reject the null hypothesis. Hence, the test provides statistical support for the conclusion that the special promotion increased the proportion of women players at the Pine Creek golf course. Page 4 of 8

Problem 5: Researchers are wondering whether a greater proportion of people now dream in color than did so before color television and movies became as prominent as they are today. The hypotheses they are checking are H 0 : p = 0.29 H 1 : p > 0.29 The researchers took a random sample of 113 people. Of these 113 people, 92 reported dreaming in color. Find the value of the sample proportion, ˆp. Find the observed value of the test statistic, and then find the p-value associated with this observed value. In the conclusion (rejecting or failing to reject H 0 ), interpret the p-value in context. Assume that the conditions that must be met in order for us to use the N(0, 1) distribution as the sampling distribution are satisfied. The sample proportion is ˆp = 92 113 the test statistic, = 0.8142. We can now compute the observed value of z = ˆp p 0 SE = 0.8142 0.29 0.042686 = 12.28 The p-value is the area under the N(0, 1) curve to the right of 12.28. You can calculate the p-value to be p-value = 0.001, and thus, we reject the null hypothesis. We can conclude that if the proportion of people who now dream in color is the same as it was before color television, then the probability of getting a test statistic as large or larger than 12.28 is extremely small. Page 5 of 8

Problem 6: Apnea of prematurity occurs when premature babies have shallow breathing or stop breathing for more than 20 seconds. One therapy for this condition is to give caffeine to the premature infants. Medical researchers conducted an international study in which one sample of premature infants was randomly assigned to receive caffeine therapy, and another sample received a placebo therapy. Researchers compared the rate of severely negative outcomes (death and severe disabilities) in the two groups to determine whether the caffeine therapy would lower the rate of such bad events. The caffeine therapy group included 937 infants. Of these 937 infants, 377 suffered from death or disability. The placebo group had 932 infants, and of these, 431 suffered from death or disability. Perform a hypothesis test (the book calls it a four-step hypothesis test) to test whether the caffeine therapy was effective (that is, whether it succeeded in lowering the death or disability proportion). Use the following as a guide: a. Determine what you will call p 1 and p 2. Calculate p 1, p 2, and ˆp. b. State your null and alternative hypothesis. (Hint: the null hypothesis is neutral as it should say that there is no difference between your proportions). c. Assume the data comes from an approximately normal distribution and that they meet the criteria to carryout a two-proportion z-test. Go ahead and conduct such a test; feel free to estimate your p-value or to use the computer. d. Based on your results, what can you conclude about the helpfulness of the caffeine therapy, does it help? This is example 13 on pg. 361, but I will briefly list the answer here. a. Let p 1 be the proportion of death or disability in all infants who could receive caffeine therapy, and let p 2 be the proportion of death or disability in all infants who could receive the placebo therapy. Then, p 1 = 0.4023 and p 2 = 0.4624. Furthermore, the pooled estimate of the sample proportion is p = 0.4323. ˆ b. The null and alternative hypothesis are: H 0 : p 1 = p 2 H a : p 1 < p 2 c. Using the standard normal distribution, we can calculate an observed z = 2.62. From inspection alone you should notice that the p-value (area to the right of z = 2.62) is relatively small. Using your software, you should have obtained a p-value = 0.004. This p-value is less than the significance level of 0.05, so we reject the null hypothesis. d. We conclude that the caffeine therapy does help: A lower proportion of babies will die or suffer disability with this therapy. Page 6 of 8

Problem 7: Data were collected on the amount spent by 64 customers for lunch at a major Houston restaurant. These data are contained in the Excel file named Houston. Based upon past studies the population standard deviation is known with σ = $6. a. At a 99% confidence, what is the margin of error? b. Develop a 99% confidence interval estimate of the mean amount spent for lunch. Interpret your result. a. For 99% confidence, the margin of error is z SE = 2.57 σ n = 2.57 6 64 = 1.93 b. The mean spent for lunch is x = $21.52. With the margin of error equal to $1.932, the 99% confidence interval estimate of the mean amount spent for lunch is x± Margin of error = 21.52 ± 1.932 = (19.59, 23.45) Hence, we can say that we are 99% confident that on average customers will spend between $19.59 and $23.45 for lunch. Page 7 of 8

Problem 8: Repeat problem 7, but this time assume that you do not know the population standard deviation. In reality, we hardly ever know the population standard deviation, so instead we use the sample standard deviation (s) as an estimate. However, using the sample standard deviation requires us to use the t-distribution. a. For 99% confidence, the margin of error is t SE EST = 2.66 s n = 2.66 6.89 64 = 2.29 b. The mean spent for lunch is x = $21.52. With the margin of error equal to $2.29, the 99% confidence interval estimate of the mean amount spent for lunch is x± Margin of error = 21.52 ± 2.29 = (19.23, 23.81) Hence, we can say that we are 99% confident that on average customers will spend between $19.23 and $23.81 for lunch. Notice that using the t-distribution resulted in a wider confidence interval. Why do you think that is? Page 8 of 8