Practice Final Exam Multiple-Choice and True-False Questions 1) We are using a regression model to make a height prediction for a child of a specified age. If we wish to generate an interval for our regression line that covers our prediction of the height of our next pick of a child with 95% certainty, we would use a: a) confidence interval b) prediction interval c) neither d) either Answer: b 2) T or F: For a given sample, a 50% confidence interval for the mean is narrower than a 90% confidence interval for the mean. Answer: T 3) T or F: In the context of linear regression, R 2 tells us the proportion of total variability that can be explained by our model. Answer: T 4) T or F: The p-value is the probability of rejecting your null hypothesis. Answer: F 5) T or F: If we randomly select 5 dorms in which to set up security cameras, and we see that crime has decreased in all 5 dorms, we can conclude that the security cameras caused the decrease in crime. Answer: F 6) Consider our studies of confidence intervals. Fill in the blanks with random or fixed. The parameter is _Fixed. The statistic is Random. The interval is _Random. 7) Which of the following would be expected to result in a larger standard error of the mean? (A) a larger sample size (B) a smaller sample size (C) a smaller population standard deviation (D) a larger population standard deviation (E) Choices (B) and (D) Answer: (E) Choices (B) and (D) (a smaller sample size; a larger population standard deviation)
Given the formula for the standard error of the mean, increasing the numerator or decreasing the denominator will both result in a larger standard error. A more variable population will result in more variable sample means, and a smaller sample size will also result in more variable sample means, in both cases resulting in a larger sample error. 8) A nutritionist has conducted a multiple linear regression predicting the number of calories in breakfast cereals based on the amount of fat, sugar, and fiber in grams. Unfortunately her printer is broken and some of the R output has been blocked out. Which of the following is a correct statement the nutritionist can conclude based on the visible R output (and without looking at any tables, etc.)? (A) The coefficient of determination is approximately 19.8 (B) The overall F-test is not statistically significant at the α = 0.10 level (C) The adjusted R-squared is between 0.71 and 1.00 (D) The coefficient for the intercept has a p-value greater than 0.05 (E) The coefficient for the variable Fiber has a p-value less than 0.10 Answer: (E) The coefficient for the variable Fiber has a p-value less than 0.10 We can conclude this by noting that the t value for the Fiber variable is 6.030/1.992 = 3.03, which gives us a p-value well below 0.10. The other responses are based on misinterpretations of the output. 9) Which of the following is/are true about the p value? (Circle all that apply)
A. Indicates the probability of seeing the observed result, and results more extreme, by chance alone (given that the null hypothesis is true). B. Indicates the probability that the null hypothesis is true. C. Rules out the role of bias and/or confounding D. Indicates that the results observed are of medical or public health significance Answer: A 10) If you observe a significant association in a study, which of the following is the least likely alternative explanation of the association? (choose only one) A. Bias B. Confounding C. Lack of power D. A and B E. B and C Answer: C) 11) In a hypothesis testing about a population mean, the p value is found to be 0.04. Which of the following is/are true about the population mean? Assume that the population mean given the null hypothesis is µ o. Circle all that apply. A. The 95% confidence interval includes the µ o B. The 99% confidence interval includes the µ o C. The 90% confidence interval includes the µ o D. All of the above are true E. None of the above is true. Answer: B 12) Which of the following is/are the assumptions of linear models: A. The response variable is normally distributed B. The residuals are normally distributed C. All the observed units are independent from each other. D. The relationship between the response variable and the predictors are linear E. All of the above F. None of the above Answer: B, C, D
13) The confidence interval at the 95% level of confidence for the true population proportion was reported to be (0.750, 0.950). Which of the following is a possible 90% confidence interval from the same sample? a) (0.766, 0.934) b) (0.777, 0.900) c) (0.731, 0.969) d) (0.050, 0.250) Answer: a). Since we are decreasing the amount of confidence, the size of the interval must also decrease. Answer A is the only option available that is smaller than the reported interval. 14) Which of the following statements about the Central Limit Theorem (CLT) is correct? a) The CLT states that the sample mean x is always equal to the population mean, m. b) The CLT states that the sampling distribution of the sample mean x is approximately normal for large sample sizes ( n > 30 ). c) The CLT states that the sample mean x is equal to the population mean m, provided that n > 30. d) The CLT states that the sampling distribution of the population mean m is approximately normal, provided that n > 30 Answer: b) 15) You have measured the systolic blood pressure of a random sample of 30 employees of a company. A 95% confidence interval for the mean systolic blood pressure for the employees is computed to be (122, 138). Which of the following statements gives a valid interpretation of this interval? Answer: d) a) 95% of the sample of employees has a systolic blood pressure between 122 and 138. b) 95 % of the employees in the company have a systolic blood pressure between 122 and 138. c) If the sampling procedure were repeated 100 times, then approximately 95 of the sample means would be between 122 and 138. d) If the sampling procedure were repeated 100 times, then approximately 95 of the resulting 100 confidence intervals would contain the true mean systolic blood pressure for all employees of the company. e) We are 95% confident the sample mean is between 122 and 138.
1.64). 1.96). 16) Sixty-five percent of all divorce cases cite incompatibility as the underlying reason. If four couples file for a divorce, what is the probability that no couples will state incompatibility as the reason? a) 0.015 b) 0.05 c) 0.18 d) 0.31 e) 0.35 Answer: a). P(None incompatible) =(1-.65)^4 = 0.015 17) A house cleaning service claims that it can clean a four-bedroom house in less than 2 hours. A sample of n = 36 houses is taken and the sample mean is found to be 1.97 hours and the sample standard deviation is found to be 0.1 hours. Using a 0.05 level of significance the correct conclusion is: a) reject the null because the test statistic (-1.8) is < the critical value (-1.64). b) do not reject the null because the test statistic (-1.8) is < the critical value (- c) reject the null because the test statistic (-1.8) is > the critical value (-1.96). d) do not reject the null because the test statistic (-1.8) is > the critical value (- Answer: a). This is a one- sided test, so the critical value is - 1.64, instead of - 1.96 18) A hypothesis test is done in which the alternative hypothesis is that more than 10% of a population is left-handed. The p-value for the test is calculated to be 0.25. Which statement is correct? a) We can conclude that more than 10% of the population is left-handed. b) We can conclude that more than 25% of the population is left-handed. c) We can conclude that exactly 25% of the population is left-handed. d) We cannot conclude that more than 10% of the population is left-handed. Answer: d) Since the p- value is large, we cannot reject the null hypothesis. 19) As the degrees of freedom for the t distribution increase, the distribution approaches a) value of zero for the mean. b) The t distribution c) The normal distribution. d) The binomial distribution.
20) Which statement is NOT true about hypothesis tests? a) Hypothesis tests are only valid when the sample is representative of the population for the question of interest. b) Hypotheses are statements about the population represented by the samples. c) Hypotheses are statements about the sample (or samples) from the population. d) Conclusions are statements about the population represented by the samples. 21) In regression analysis, if the coefficient of determination (R 2 ) is 1.0, then: a. SSE (error sum of squares) must be 1.0 b. SSR (regression sum of squares) must be 1.0 c. SSE must be 0.0 d. SSR must be 0.0 22) What do residuals represent in the simple linear regression model? a) The difference between the actual Y values and the mean of Y. b) The difference between the actual Y values and the predicted Y values. c) The square root of the slope. d) The predicted value of Y for the average X value e) None of the above. Answer: b) 23) The probability that a region prone to hurricanes will be hit by a hurricane in any single year is 0.1. What is the expected number of hurricanes to hit the area in the next 90 years? a) 9 b) 3 c) 8.1 d) 2.85 e) None of the above Answer: a)
24) For which of the following hypotheses tests above would the p-value be the same whether the sample mean is 44 or 46? a) I. b) I. and IV. c) II. and III. d) IV. Answer: I 25) We are told that a 95% prediction interval for a response variable, y, is (23.2, 35.6) from a simple regression on a sample of n = 100 observations at x* = 10. Which of the following is a reasonable estimate for the confidence interval for µ y at x* = 10? a) (13.2, 45.6) b) (24.2, 27.8) c) (28.6, 30.2) d) (33.2, 45.6)