Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to be valid if each of the expected cell frequencies is A. greater than 0. B. less than 5. C. between 0 and 5. D. at most 1. E. at least 5. Use the following to answer the next three questions: A physiologist is interested in determining the proportion of algae samples from a local rivulet that belong to a particular phyla, and he believes they should be uniformly distributed. A random sample of 60 alga were obtained, and each was categorized as being Rhodophyta, Chlorophyta, or Heterokontophyta. The observed counts were 25, 25, and 10, respectively. 2. The chi square statistic is A. 0. B. 7.50. C. 20. D. 150. 3. When determining the significance of the chi square statistic, the physiologist would use A. 1 degree of freedom. B. 2 degrees of freedom. C. 3 degrees of freedom. D. 4 degrees of freedom. 4. The p value of chi square goodness of fit test in this case is A. greater than 0.10. B. below 0.10 but above 0.05. C. below 0.05 but above 0.01. D. below 0.01. Page 1 of 7

5. In a χ 2 test for independence, the statistic based on a contingency table with 6 rows and 5 columns will have how many degrees of freedom? A. 30 B. 24 C. 5 D. 20 E. 25 Use the following to answer the following two questions: A fisheries biologist is interested in studying the relationship between width and weight in horseshoe crabs. She collects a random sample of such crabs and cross classifies them based on these variables as given below. Width (in cm) Weight 0 5 5 10 10 15 15 20 < 1.8kg 39 36 29 18 > 1.8kg 11 14 21 32 6. Suppose we wish to test the null hypothesis that there is no association between their width and weight. Under the null hypothesis, what is the expected number of crabs in the low-weight class and widest width class? A. 18.0 B. 25.0 C. 30.5 D. 50.0 7. Which hypotheses are being tested by the chi square test? A. The null hypothesis is that width and weight are independent, and the alternative is that they are dependent. B. The null hypothesis is that the mean number of crabs that are in the low weight-class is the same for each of the four width classes, and the alternative is that these means are different. C. The null hypothesis is that the distributions of the number of crabs that are in the low and high weight classes are the same for the four widths. The alternative says the distributions are different. D. The null hypothesis is that the distributions of the total number of crabs sampled in each of the four widths are the same. The alternative is that these distributions are different. Page 2 of 7

8. The line described by the regression equation attempts to A. pass through as many points as possible. B. pass through as few points as possible. C. minimize the number of points it touches. D. minimize the squared distance from the points. 9. The fraction of the variation in the values of a response y that is explained by the least-squares regression of y on x is A. the correlation coefficient. B. the slope of the least-squares regression line. C. the square of the correlation coefficient. D. the intercept of the least-squares regression line. 10. Given the bivariate sample, (x 1, y 1 ), (x 2, y 2 ),, (x n, y n ), suppose x = 2, ȳ = 3 s x = 1, r = 1 3, s y = 6. Which of the following is the regression line? A. y = 2x 4 B. y = x 2 C. y = 2x 1 D. y = x 6 11. The regression equation for predicting number of speeding tickets (Y ) from information about driver age (X) is Y = 0.065(X) + 5.57. How many tickets would you predict for a twenty year old? A. 6 B. 4.27 C. 5.57 D. 1 12. A clinical psychologist finds the relationship between the number of weeks spent in a therapy hospital (X = HOSPITAL) and number of seizures per week (Y = SEIZURES) is described by the following equation: Ŷ = 14.09 0.91X. This is based on a sample size of 50 patients and is associated with r = 0.93. The proportion of variance in SEIZURES accounted for by HOSPITAL (i.e., the coefficient of determination) is Page 3 of 7

A. 0.93 B. -0.93 C. 0.86 D.. -0.86 E. 14.09 13. Suppose a straight line is fit to data having response variable y and explanatory variable x. Predicting values of y for values of x outside the spread of the observed data is called A. contingency. B. extrapolation. C. causation. D. correlation. 14. Changing the units of measurements on the Y variable will affect all but which one of the following? A. The estimated intercept parameter. B. The estimated slope parameter. C. The total sum of squares for the regression. D. R squared for the regression. E. The estimated standard errors. 15. Which of the following statistical techniques is used when values of more than one variable are used to predict the value of another variable? A. Multiple regression B. ANOVA C. ANCOVA D. MANCOVA 16. A portion of an ANOVA summary table is shown below. Source Sum of Squares Degrees of Freedom Between 19 3 Within (error) 37 4 Total 56 The Mean Square Error, MSE, is A. 9.25 Page 4 of 7

B. 12.33 C. 18.50 D. 37.0 Use the following information for the next six questions How much corn should be planted per acre for a farmer to get the highest yield? Too few plants will give a low yield, while too many plants will compete with each other for moisture and nutrients, resulting in a lower yield. Four levels of planting density are to be studied: 12,000, 16,000, 20,000, and 24,000 plants per acre. The experimenters had 12 acres available for the study, and three acres were assigned at random to each of the planting densities. The data follows: Plants (per acre) Yield (bushels per acre) 12,000 150.1 113.0 118.4 16,000 166.9 120.7 135.2 20,000 165.3 130.1 139.6 24,000 134.7 138.4 156.1 Assume the data are four independent SRSs, one from each of the four populations of planting densities, and that the distribution of the yields is normal. A partial ANOVA table produced by MINITAB follows, along with the means and standard deviation of the yields for the four groups. One-way ANOVA: yield versus density Source DF SS MS F P Density 589 Error 356 Total Density N Mean StDev 12,000 3 127.17 20.04 16,000 3 140.93 23.63 20,000 3 145.00 18.21 24,000 3 143.07 11.44 17. The degrees of freedom for density (group) are A. 2. B. 3. Page 5 of 7

C. 8. D. 11. 18. The degrees of freedom for error are A. 2. B. 3. C. 8. D. 11. 19. The null hypothesis for the ANOVA is that the population mean yield A. is the same for all four planting densities. B. is increasing as the planting density gets larger. C. is decreasing as the planting density gets larger. D. first increases and then decreases as the planting density gets larger. 20. The sum of squares for error is A. 196. B. 356. C. 589. D. 2848. 21. The pooled standard deviation is A. 11.44. B. 18.87. C. 23.63. D. 22.48. 22. The value of the F statistic is A. 0.55. B. 4.73 C. 1.82. D. 4.83. 23. A researcher is performing a two way ANOVA using two factors. The first factor has 6 levels and the second factor has 9 levels. In the ANOVA Summary Table, the degrees of freedom for the interaction between the first and second factors will be A. 40 Page 6 of 7

B. 45 C. 48 D. 54 Page 7 of 7