STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 1 OF 7 Statistics E100 Fall 2013 Midterm I Exam Solutions 1. (9 points) Forbes magazine published data on the annual salary of the chief executive officer for 50 of the best small firms in 2012. The annual salary was reported in $1,000s. The salaries ranged from $44,000 to $2,105,000 and are depicted in the two plots below. a) (3 points) The mean of these salaries is $730,000. What is your guess of the median: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the line in the middle of the box in the boxplot). b) (3 points) What is your guess of the standard deviation for this data: $200,000, $400,000, $600,000, or $1,000,000? $400,000 (The whole range is $2 million, divide that by 5 to get a guess of SD) c) (3 points) What is your guess of the interquartile range for this data: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the width of the box in the boxplot).
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 2 OF 7 2. (26 points) The General Social Survey measured the annual income (income: measured in dollars) and education (educ: measured in number of years) for n = 1758 adults in the US. The results of a regression for this data are shown below: a) (4 points) What is the correlation between income and educ?. We know it must be positive since the slope is positive. b) (5 points) What is the value for the slope in this model? What is its interpretation? The slope in this model is $5168.32. This means that for every extra year of school completed, a person s family income is predicted to increase by $5168.32 on average. c) (4 points) Interpret the value of R 2 for this regression. R 2 for this model is reported to be 0.154. This means that 15.4% of the variability in income can be predicted the number of years of school completed.
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 3 OF 7 d) (4 points) College graduates will typically have 16 years of education after receiving their Bachelor s degree. Based on this model, what is the predicted annual income for a college graduate? ^ y = a + b(x) = -36133 + 5168(x) = -36133 + 5168(16) = $46,555 e) (4 points) A recently graduated friend of yours (with 16 years of education) gets a job and doesn t want to tell you her income, but you know her residual is $25,000. What is your friend s actual income? ^ e = y y. Solving for y we get: y = y + e = 25000 + 46555 = $71,555 ^ f) (5 points) This friend of yours interprets this model to mean that more education causes increased income later in life. Briefly comment on your friend s interpretation. This may not be a correct conclusion; this relationship may not be causal. Since this is based on a survey (on observational study), there may be confounding factors in this association. For example, people who are hard-working or more driven may end up getting more education and have more income. If you took away this extra education, they may make the same amount of income anyway.
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 4 OF 7 3. (15 points total, 3 points each) Multiple Choice a) A survey of 122 families with epileptic children explored the behavior of the family dog in connection with epileptic seizures. Many families claimed that their dog was able to anticipate an upcoming seizure, and demonstrated its concern in a variety of ways. It was reported that anticipation time ranged from 10 seconds to 5 hours, with an average of 2.5 minutes. The shape of the distribution of anticipation times is likely i) skewed right ii) skewed left iii) symmetric iv) categorical b) Which of the following is not a property of r, the correlation coefficient? i) r is always between 0 and 1. ii) r does not depend on the units of y or x. iii) r measures the strength of the linear relationship between x and y. iv) r does not depend on which of the two variables is labeled as x. c) Mankiw was concerned that the highest score on the first Economics exam was only 99 (instead of 100). He decided to add one point to everyone s score. The effect of this would be: i) The standard deviation would increase by 1. ii) The median would change but the mean would not. iii) The standard deviation would not change but the mean and median would increase. iv) none of these d) On a statistics exam with a mean of 76 and SD of 12, Tom scored one standard deviation above the mean, Mary had a score of x = 79, and Bill had a z-score of z = -0.5. Place these three students in order from lowest to the highest score. i) Bill, Mary, Tom ii) Mary, Tom, Bill iii) Tom, Bill, Mary iv) Tom, Mary, Bill e) A simple random sample of 1200 adult Americans was selected and each person was asked the following question. In light of the huge national deficit, should the government at this time spend additional money to establish a national system of health insurance? 39% of those responding answered yes. Which of the following results is most likely? i) Accurate and unbiased ii) An understatement of the true percentage iii) An overstatement of the true percentage
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 5 OF 7 4. (18 total points) A recent study examined the effectiveness of bicycle safety helmets in reducing head injuries. The data consist of a random sample of 837 cyclists who were involved in bicycle accidents in a one-year period; the data are summarized in the following two-way table. Wearing Helmet (H) Yes No Total Head Injury (I) Yes 101 90 191 No 418 228 646 Total 519 318 837 a) (4 points) What proportion of all cyclists in accidents wore a helmet AND suffered a head injury? Let H = helmet and I = injury. Then: ( ) b) (4 points) Among those cyclists wearing a helmet, what is the proportion of cyclists who suffered a head injury? ( ) c) (5 points) Ignoring any issues with study design, does this table suggest that helmets might be effective at decreasing head injuries? Justify your answer in 1-2 sentences. We should compare ( ) with ( ). Since these proportions are so different, it looks like wearing a helmet may in fact decrease the chance of a head injury. The distribution of head injuries in these two groups are quite different (rate of head injury lower for those wearing a helmet). d) (5 points) The cyclists were randomly selected from accident victims, but were not randomized to helmet use vs. no helmet use. Give one possible confounding variable here, and explain briefly why it may be a confounder (in 1-2 sentences). Any variable that is related both to whether or not someone wears a helmet and with the chance of getting a head injury could be considered a possible confounder. For example: riding speed of the cyclist at the time of the accident. If bikers who choose not to wear helmets tend to ride faster, it may be the speed of travel at which they have their accident that causes the head injury and not the fact that they are not wearing a helmet. Another possibility: the general carelessness of the way the biker rides.
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 6 OF 7 5. (22 points) For men, binge drinking is defined as having five or more drinks in a row, and for women as having 4 or more drinks in a row. According to a study by the Harvard School of Public Health, 44% of college students engage in binge drinking and 56% do not binge drink (either drink moderately or abstain entirely). Another study has found that among young adult binge drinkers, 17% have been involved in an alcohol related automobile accident. Among adults of the same age who are not binge drinkers, 9% have been involved in such accidents. a) (5 points) Are the events binge drinking and being involved in an alcohol-related automobile accident independent? Support your statement numerically. Let s define the events: A = auto accident B = binge drinker We were given P(B) = 0.44, P(A B) = 0.17 and P(A B C ) = 0.09. Since P(A B) = 0.17 P(A B C ) = 0.09, we know that A and B are dependent. b) (5 points) What is the probability that a randomly selected college student will be both a binge drinker and will have been involved in an alcohol related automobile accident? P(A and B) = P(A B)*P(B) = 0.44(0.17) = 0.0748 c) (5 points) What is the probability that a randomly selected college student will be a binge drinker or will have been involved in an alcohol related automobile accident? P(A or B) = P(A) + P(B) P(A and B) = 0.1252 + 0.44 0.0748 = 0.4904 Note: P(A) = P(A and B) + P(A and B C ) = 0.44(0.17) + 0.056(0.09) = 0.0748 + 0.0504 = 0.1252 Or it can be solved by: P(A or B) = P(B) + P(A and B C ) = 0.44 + 0.56(0.09) = 0.4904 One more way to do it: P(A or B) = 1 P(A C and B C ) = 1 0.56(0.91) = 1 0.5096 = 0.4904
STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 7 OF 7 d) (5 points) Given a college student has been involved in an alcohol-related automobile accident, what is the probability that he or she is a binge drinker? Based on previous work: P(B A) = P(A and B) / P(A) = 0.0748 / 0.1252 = 0.597 Note, this can be calculated based on the 2x2 table or directly by Bayes theorem. 6. (12 points) The National Collegiate Athletic Association (NCAA) requires a Division I athlete (one that has an average high school GPA) to score at least 820 on the combined math and verbal parts of the SAT exam to compete in their first college year. In 2012, the scores of all students nationwide taking the SATs were approximately normally distributed with mean μ = 1012 and standard deviation σ = 219. a) (6 points) What proportion of all students nationwide had scores less than 820? P(X < 820) = P(Z < (820-1012)/219) = P(Z < -0.88) = 0.1894 b) (6 points) What value did a student have to receive on the SAT in order to be in the top 1% of all students nationwide? Here, we need to first find the z-value that puts 1% of the distribution to the right of it (or 99% to the left of it so we know it should be positive). Looking it up on the standard normal z-table, we see that a z-value of z * = 2.33 is the value we want. Thus in terms of SAT scores in the general population: X = µ + z*(σ) = 1012 + 2.33(219) = 1522