UNIVERSITY OF SASKATCHEWAN Department of Mathematics and Statistics April 22, 1995 STATISTICS 245 FINAL EXAMINATION Instructor: M.J. Miket Time: 3 hrs A textbook, formulae sheets and a calculator are allowed Note: NoT shall stand for none of these I. For the following sample of measurements, given as a stem-and-leaf display, 0 69 1 117 2 04 3 1 Select the correct value of the statistic in questions (1) - (4). (a) 11 (b) 17 (c) 25 (d) 8.493 (e) 8.333 (f) 14 (g) 10 (h) 16.125 (i) 72.125 (j) 8.110 1. Mean 2. Median 3. Q 1 4. Standard deviation. II. Given that E and F are independent events with probabilities P (E)=.2 and P (F )=.4, determine: 5. P (E F ) 6. P (F E) 7. P (F E) 1
8. P (F E) Choices for questions (5) - (8) are: (a) 0.08 (b) 0.14 (c) 0.26 (d) 0.33 (e) 0.20 (f) 0.42 (g) 0.52 (h) 0.64 (i) 0.77 (j) 0.40 III. A completely unprepared student decides to guess the answer to each of the 50 questions on a STATS 245 supplementary exam. 9. The first 25 questions are of a true-false type. Find the probability that the student will pass this half of the exam (for a pass 60% of questions must be answered correctly). 10. The last 25 questions are of a multiple choice type and each question has five possible responses. For a pass, 50% of these questions must be answered correctly. Find the probability of passing this half. Choices for questions (9) - (10) are: (a) 0.000 (b) 0.115 (c) 0.120 (d) 0.008 (e) 0.115 (f) 0.146 (g) 0.054 (h) 0.002 (i) 0.212 (j) NoT IV. In the Framingham Study, serum cholesterol levels were measured for a large number of healthy males. The population was then followed for 16 years. At the end of this time, the men were divided into two groups - those who had developed coronary heart disease and those who had not. The distributions of the initial serum cholesterol levels for each group were found to be approximately normal. Among individuals who eventually developed coronary heart disease, the mean serum cholesterol level was µ d = 244 mg/100ml, and the standard deviation was σ d =51mg/100ml; for those who did not develop the disease, the mean serum cholesterol level was µ nd = 219 mg/100ml, and the standard deviation was σ nd =41mg/100ml. 11. Suppose that an initial serum cholesterol level of 260 mg/100ml or higher is used to predict future coronary heart disease. What is the probability of predicting the disease for a man who will never develop it ( false positive error )? 12. What is the probability of failing to predict coronary heart disease for a man who will develop it ( false negative error )? 13. If repeated samples of size 10 are selected from the population of males that do not develop the coronary heart disease, what proportion of the samples will have a mean serum cholesterol level greater than 260 mg/100 ml? Choices for questions (11) - (13) are (a) 0.0008 (b) 0.1587 (c) 0.2821 (d) 0.3148 (e) 0.6217 (f) 0.6852 (g) 0.7179 (h) 0.8531 (i) 0.9664 (j) NoT 2
V. Questions (14) - (29) are based on the following scenarios. You might find it useful to consider questions (14) - (18) as you read the scenarios. Scenario 1: Returning to the Framingham Study, it is believed that the mean serum cholesterol level of the men who do not develop heart disease must be less than the mean level of men who do. A sample of size 15 from the population of men who do not go on to develop coronary heart disease shows x = 219 mg/100ml and s =41mg/100ml. Can it be concluded that the true population mean for this group of men is 244 mg/100ml at the α =.05 level of significance? Scenario 2: A STATS 245 final examination, when set up by a certain instructor, consists of 75 multiple choice questions; each question with five possible responses. You want to establish that Alexander Joseph performs better on the exam than a person who guesses on every question. If Alexander Joseph obtains 22 correct, what is your conclusion at α =.05? Scenario 3: A commercial farmer harvests his entire field of beans at one time. Therefore he would like to plant a variety of green beans that mature all at one time (i.e. small standard deviation between maturity times of individual plants). A seed company has developed a new hybrid strain of green beans that it believes to be better for the commercial farmer. The maturity time of the standard variety has a mean of 50 days and a standard deviation of 2.1 days. A random sample of 30 plants of the new hybrid showed a standard deviation of 1.65 days. Is the new variety better at the 0.05 level of significance. Scenario 4: It would be interesting to determine whether the advice given by a physician during a routine physical examination is effective in encouraging patients to stop smoking. In a study of current smokers, one group of patients was given a brief talk about the hazards of smoking and was encouraged to quit. A second group received no advice pertaining to smoking. All patients were given a follow-up exam. In a sample of 114 patients who had received the advice, 11 reported that they had quit smoking; in a sample of 96 patients who had not, 7 quit smoking. Scenario 5: A study was conducted to investigate whether oat bran cereal helps to lower serum cholesterol in hypercholesterolemic males. A random sample of such individuals were placed on a diet which included either oat bran or corn flakes; after two weeks, their low-density lipoprotein (LDL) cholesterol levels were recorded. Each man was then switched to the alternative diet. After a second two week period, the LDL cholesterol level of each individual was again recorded. The data from this study are provided below. (population 1=corn flakes, population 2=oat bran) 3
corn flakes oat bran difference 1 4.61 3.84 0.77 2 6.42 5.57 0.85 3 5.40 5.85-0.45 4 4.54 4.80-0.26 5 3.98 3.68 0.30 6 3.82 2.96 0.86 7 5.01 4.41 0.60 8 4.34 3.72 0.62 9 3.80 3.49 0.31 10 4.56 3.84 0.72 11 5.35 5.26 0.09 12 3.89 3.73 0.16 13 2.25 1.84 0.41 14 4.24 4.14 0.10 x 62.21 57.13 5.08 x 2 288.64 247.65 3.99 For the five scenarios described above, choose the appropriate hypotheses to be tested from the list given below: 14. Hypotheses to be tested for Scenario 1. 15. Hypotheses to be tested for Scenario 2. 16. Hypotheses to be tested for Scenario 3. 17. Hypotheses to be tested for Scenario 4. 18. Hypotheses to be tested for Scenario 5. Choices for (14) - (18) are (a) H 0 : σ 2 =4.41, H 1 : σ 2 4.41 (b) H 0 : p 1 = p 2, H 1 : p 1 <p 2 (c) H 0 : µ = 244, H 1 : µ<244 (d) H 0 : µ D =0, H 1 : µ D 0 (e) H 0 : p 1 p 2 =0, H 1 : p 1 p 2 > 0 (f) H 0 : p =.2, H 1 : p.2 (g) H 0 : σ 2 =4.41, H 1 : σ 2 < 4.41 (h) H 0 : p =.2, H 1 : p>.2 (i) H 0 : µ = 244, H 1 : µ 244 4
(j) H 0 : µ D =0, H 1 : µ D > 0 19. For which of the above scenarios would you apply the z-test? 20. For which of the above scenarios would you apply the t-test? 21. For which of the above scenarios would you apply the χ 2 -test? Choices for questions (19), (20) and (21): (a) forscenario4only (b) for scenarios 1 and 3 (c) for scenarios 1 and 5 (d) for scenarios 2 and 4 (e) for scenarios 2 and 5 (f) for scenarios 3, 4 and 5 (g) for scenarios 1, 2 and 5 (h) forscenario2only (i) forscenario3only (j) NoT 22. For Scenario 1, which of the following statements are correct or must be assumed? (i) the population is normal (ii) the Central Limit Theorem holds (iii) the standard deviation σ is known (a) (i) only, (b) (ii) only, (c) (iii) only, (d) (i) + (ii) only, (e) (i) + (iii) only, (f) (ii) + (iii) only, (g) all three, (h) NoT. 23. If you carry out the hypothesis test at the 0.05 significance level, your conclusion for Scenario 1 is (b) the test is not significant at 0.05 level (c) the sample is too small to conclude anything (d) the test is significant at 0.05 level 5
(e) NoT 24. For Scenario 2, the rejection region at the 5% level of significance is: (Note: TS denotes test statistic ) (a) TS 1.96 (b) TS 1.96 (c) TS 1.96 or TS 1.96 (d) TS 1.330 (e) TS 1.330 (f) TS 1.734 or TS 1.734 (g) TS 1.645 (h) TS 1.645 (i) TS 1.645 or TS 1.645 (j) NoT 25. For Scenario 2, thep-valueis (a) 0.217 (b) 0.365 (c) 0.250 (d) 0.223 (e) 0.022 (f) 0.694 (g) 0.688 (h) 0.063 (i) 0.775 (j) NoT 26. For Scenario 3, the numerical value of the test statistic is: (a) 3.26 (b) 9.65 (c) 14.50 (d) 14.22 (e) 17.71 (f) 87.64 (g) 10.88 (h) 17.90 (i) 3.75 (j) NoT 27. Based on the observed test statistic, the correct conclusions for Scenario 3 are: (a) reject at 5% level, reject at 10% level (b) retain at 5% level, retain at 10% level (c) retain at 5% level, reject at 10% level (d) reject at 5% level, retain at 10% level (e) NoT 6
28. For Scenario 4, a 98% confidence interval for the difference of proportions is: (a) (-0.0142, 0.1218) (b) (-0.0112, 0.1182) (c) (-0.0657, 0.1129) (d) (-0.0104, 0.1256) (e) (-0.0118, 0.1182) (f) (-0.0148, 0.1212) (g) (-0.0088, 0.1152) (h) (-0.0082, 0.1158) 29. The numerical value of the test statistic for scenario 5 is: (a) 1.207 (b) 2.432 (c) 3.285 (d) 4.237 (e) 5.331 (f) 6.178 (g) 7.317 (h) 8.024 (i) 9.381 (j) NoT VI. Three different methods of class evaluation for STATS 245 were investigated to determine whether they influence learning. The methods differed in the number of tests, homework and computer assignments. The same text and instructor were used in all three methods. The response variable was percentage of test points obtained by each student on the final exam. The actual data got erased by accident, but some summary quantities are n 1 =29, x 1 =74.7, s 1 =12.5 n 2 =18, x 2 =78.5, s 2 =12.6 n 3 =15, x 3 =79.5, s 3 =8.0 Also parts of the ANOVA table got erased, but it is actually possible to fill in blank spots. The missing entries are denoted by question numbers in parenthesis. Source D.F. S.S. M.S. F Treatment (30) (31) 143.2 1.06 Error (32) 7970.6 (33) Total (34) 8257 Work out the missing entries (30) - (34), and then select your answers from the following (rounded) choices. (a) 61 (b) 5 (c) 286 (d) 59 (e) 135 (f) 2 (g) 80 (h) 30 (i) NoT 35. At the 0.05 significance level, the rejection region is given by (a) TS 2.61 (b) TS 5.70 (c) TS 3.15 (d) TS 2.49 (e) TS 3.49 (f) TS 8.74 (g) TS 3.29 (h) TS 4.16 36. Based on the observed F statistic, your conclusion is best described as (a) reject H 0 : µ 1 = µ 2 = µ 3 7
(b) reject H 0 : µ 1 = µ 2 (c) the test is not significant (d) the test is significant (e) both (a) and (d) (f) both (b) and (d) (g) NoT VII. The tumor-producing potential of a new drug was tested. One hundred rats were used as a control group, 100 were exposed to a low dose of a new drug, and 100 were exposed to a high dose. The results were 0tumors 1ormore control 93 7 100 low dose 89 11 100 high dose 86 14 100 268 32 300 Is there sufficient evidence to conclude that the dosage does, in fact, affect the occurrence of tumors using α =.05? 37. What is the number of degrees of freedom associated with this contingency table? (a) 12 (b) 9 (c) 16 (d) 6 (e) 11 (f) 20 (g) 3 (h) 13 (i) 2 (j) NoT 38. At the.05 level of significance, what is the critical region for the test statistic? (a) TS 2.59 (b) TS 5.99 (c) TS 4.45 (d) TS 0.10 (e) TS 2.59 (f) TS 5.99 (g) TS 4.45 (h) TS 0.10 39. What is the expected number of rats with no tumors after having taken a high dose? 40. What is the contribution of the rats from the preceding question to the overall measure of discrepancy between the observed and expected frequencies (that is, to the numerical value of the test statistic)? Choices for question (39) and (40) are: (a) 11.07 (b) 89.33 (c) 0.23 (d) 14.79 (e) 4.45 (f) 7.25 (g) 0.12 (h) 0.48 (i) 0.33 (j) NoT 8
VIII. Crickets make a chirping sound with their wing covers. Scientists have recognized that there is a linear relationship between the frequency of chirps and the temperature. The table below contains measurements for the striped ground cricket: y 20.0 16.0 19.8 18.4 17.1 15.5 14.7 17.1 15.4 16.3 15.0 17.2 16.0 17.0 14.4 x 88.6 71.6 93.3 84.3 80.6 75.2 69.7 82.0 69.4 83.3 79.6 82.6 80.6 83.5 76.3 Here, y is chirps per second and x is the temperature in degrees Fahrenheit. The data are shown on the following scatter plot. 9
10
Summary statistics are: n =15 15 y i = 249.90 15 x i = 1200.60 15 i=1 i=1 15 y 2 i = 4203.81 i=1 i=1 i=1 x 2 i =96, 725.86 15 x i y i =20, 135.80. MINITAB output for this set of data is also enclosed to help answer questions. 41. What proportion of the total variability in y is explained by the linear regression? (a) 0.895 (b) 0.702 (c) 0.781 (d) 0.682 (e) 1.000 (f) 0.873 (g) 0.590 (h) 0.465 (i) 0.911 (j) NoT 42. What is the value of the correlation coefficient? (a) 0.465 (b) -0.781 (c) -0.682 (d) 0.838 (e) -0.884 (f) -0.465 (g) 0.781 (h) 0.682 (i) -0.838 (h) 0.884 43. If you test H 0 : β 0 =0against H 1 : β 0 0at 0.05 significance level, your conclusion will be best described as: (a) reject H 0 (b) do not reject H 0 (c) the test is not significant 11
(d) the test is significant (e) both (a) and (d) (f) both (b) and (c) (g) NoT 44. What is the mean predicted frequency when the temperature is 77 degrees? (a) 14.07 (b) 15.98 (c) 5.02 (d) 15.42 (e) 15.11 (f) 13.31 (g) 6.13 (h) 1.57 (i) NoT 45. Find the 95% confidence interval for this prediction. (Pick the closest answer.) (a) (13, 18) (b) (11, 19) (c) (6, 17) (d) (1, 14) (e) (15, 17) (f) (14, 18) (g) (4, 16) (h) (6, 12) THE END 12