STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 1 OF 7 Statistics 101 Fall 2013 Midterm I Exam Solutions 1. (9 points) Multiple choice. No explanations needed. Forbes magazine published data on the annual salary of the chief executive officer for 50 of the best small firms in 2012. The annual salary was reported in $1,000s. The salaries ranged from $44,000 to $2,105,000 and are depicted in the two plots below. a) (3 points) The mean of these salaries is $730,000. What is your guess of the median: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the line in the middle of the box in the boxplot). b) (3 points) What is your guess of the standard deviation for this data: $200,000, $400,000, $600,000, or $1,000,000? $400,000 (The whole range is $2million, divide that by 5 to get a guess of SD) c) (3 points) What is your guess of the interquartile range for this data: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the width of the box in the boxplot).
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 2 OF 7 2. (21 points) The General Social Survey, an observational study, measured the annual income (income: measured in dollars) and education (educ: measured in number of years) for n = 1758 adults in the US. The results of a regression for this data are shown below: a) (5 points) What is the value for the slope in this model? What is its interpretation? The slope in this model is $5168.32. This means that for every extra year of school completed, a person s family income is predicted to increase by $5168.32 on average. b) (4 points) Interpret the value of R 2 for this regression. R 2 for this model is reported to be 0.154. This means that 15.4% of the variability in income can be predicted the number of years of school completed. c) (4 points) College graduates will typically have 16 years of education after receiving their Bachelor s degree. Based on this model, what is the predicted annual income for a college graduate? ^ y = a + b(x) = -36133 + 5168(x) = -36133 + 5168(16) = $46,555
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 3 OF 7 d) (4 points) A recently graduated friend of yours (with 16 years of education) gets a job and doesn t want to tell you her income, but you know her residual is $25,000. What is your friend s actual income? ^ e = y y. Solving for y we get: y = y + e = 25000 + 46555 = $71,555 ^ e) (4 points) This friend of yours interprets this model to mean that more education causes increased income later in life. Briefly comment on your friend s interpretation. Provide a specific explanation, if appropriate. This may not be a correct conclusion; this relationship may not be causal. Since this is based on a survey (on observational study), there may be confounding factors in this association. For example, people who are hard-working or more driven may end up getting more education and have more income. If you took away this extra education, they may make the same amount of income anyway. 3. (24 points) The National Collegiate Athletic Association (NCAA) requires a Division I athlete (one that has an average high school GPA) to score at least 820 on the combined math and verbal parts of the SAT exam to compete in their first college year. In 2012, the scores of all students nationwide taking the SATs were approximately normally distributed with mean μ = 1012 and standard deviation σ = 219. a) (6 points) What proportion of all students nationwide had scores less than 820? P(X < 820) = P(Z < (820-1012)/219) = P(Z < -0.88) = P(Z > 0.88) = 0.1894
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 4 OF 7 b) (6 points) The NCAA is considering raising this minimal SAT level to 900. What percent of all students nationwide would this new policy affect and were not affected by the old policy? P(820 < X < 900) = P(X < 900) - P(X < 820) = P(Z < (900-1012)/219) P(Z < (820-1012)/219) = P(Z < -0.51) - P(Z < -0.88) = 0.3050-0.1894 = 0.1156 c) (6 points) Harvard reports that 56% of Harvard s population scored 1400 or higher on the combined math and verbal parts of the SATs. What is the corresponding SAT score for all students nationwide. That is, what score on the SAT is needed to be below 56% of all students nationwide (not just Harvard)? Here, we need to first find the z-value that puts 56% of the distribution to the right of it (so we know it should be negative). Looking it up on the standard normal z-table, we see that a z-value of z * = -0.15 is the value we want. Thus in terms of SAT scores in the general population: X = µ + z*(σ) = 1012 0.15(219) = 979 d) (6 points) There are 29 players on Harvard s men s soccer team. Assuming they are a random sample from Harvard s population, what is the approximate probability that fewer than half of them scored 1400 or higher on their combined math and verbal parts of the SATs? Let X = # soccer players with an SAT score above 1400. It is safe to assume that X follows a Binomial distribution. More specifically, X ~ Bin(n = 29, π = 0.56). First we need to calculate: ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) Note: we used the normal approximation to the Binomial here, and that s OK since nπ = 29(0.56) = 16.24 10 and nπ = 29(0.56) = 12.76 10.
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 5 OF 7 4. (20 points) According to a study by the Harvard School of Public Health, 44% of college students engage in binge drinking, and 56% are not binge drinkers (they either drink moderately or abstain entirely). Another study has found that among student binge drinkers, 17% have been involved in an alcohol related automobile accident. Among students who are not binge drinkers, 9% have been involved in such accidents. a) (5 points) Are the events binge drinking and being involved in an alcohol-related automobile accident independent? Support your statement numerically. Let s define the events: A = auto accident B = binge drinker We were given P(B) = 0.44, P(A B) = 0.17 and P(A B C ) = 0.09. Since P(A B) = 0.17 P(A B C ) = 0.09, we know that A and B are dependent. b) (5 points) What is the probability that a randomly selected college student will be both a binge drinker and will have been involved in an alcohol related automobile accident? P(A and B) = P(A B)*P(B) = 0.44(0.17) = 0.0748 c) (5 points) What is the probability that a randomly selected college student will be a binge drinker or will have been involved in an alcohol related automobile accident? P(A or B) = P(A) + P(B) P(A and B) = 0.1252 + 0.44 0.0748 = 0.4904 Note: P(A) = P(A and B) + P(A and B C ) = 0.44(0.17) + 0.056(0.09) = 0.0748 + 0.0504 = 0.1252 Or it can be solved by: P(A or B) = P(B) + P(A and B C ) = 0.44 + 0.56(0.09) = 0.4904
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 6 OF 7 d) (5 points) Given a college student has been involved in an alcohol-related automobile accident, what is the probability that he or she is a binge drinker? Based on previous work: P(B A) = P(A and B) / P(A) = 0.0748 / 0.1252 = 0.597 Note, this can be calculated based on the 2x2 table or directly by Bayes theorem. 5. (9 points) Multiple Choice: no explanation is needed. Note, these problems are not related. a) (3 points) Which of the following is NOT a property of r, the correlation coefficient? i) r is always between 0 and 1. ii) r does not depend on the units of y or x. iii) r measures the strength of the linear relationship between x and y. iv) r does not depend on which of the two variables is labeled as x. b) (3 points) On a statistics exam with a mean of 76 and SD of 12, Tom scored one standard deviation above the mean, Mary had an exam score 79, and Bill had a z-score of z = - 0.5. Place these three students in order from lowest to the highest score. i) Bill, Mary, Tom ii) Mary, Tom, Bill iii) Tom, Bill, Mary iv) Tom, Mary, Bill v) Cannot be determined from the information given. c) (3 points) A researcher calculated the values and probabilities for a random variable X as shown below. Unfortunately, he erased the last value and needs to figure out what it was. If the mean of X was 4, then what was the last value? i) 6 ii) 10 iii) 14 iv) 18 x 0 1 5?? P(X = x) 0.4 0.2 0.2 0.2
STATISTICS 101 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 7 OF 7 6. (18 points) A study was conducted to determine the GPA of Harvard college students who have experienced a case of mononucleosis infection (mono). A sample of 49 students who experienced a case of mono had an average GPA of 3.02 points in the same semester they had mono, with a standard deviation of 0.56 points. It is known that the population distribution of GPA at Harvard has a mean of 3.25 points. a) (6 points) Calculate the 95% confidence interval for the average GPA for Harvard students who experienced a case of mono. x ± t*(s/ n ) = 3.02 ± 2.021*(0.56/ 49 ) = (2.86, 3.18) b) (4 points) Ignoring any issue regarding study design, consider the 95% confidence limits you calculated in part (a) and comment on whether or not this appears to be evidence that experiencing mono affects GPA. Support your statement numerically. Since µ = 3.25 is not inside the confidence interval, then this value could be rejected as a null hypothesis. This is the mean GPA for all of Harvard, so it appears that the mean for our group, the mono-infected students, is different (in fact it is lower than all of Harvard). c) (3 points) If the true population mean GPA for all students with mono was 3.10, what would be the correct type of conclusion from 2-sided hypothesis test based on this data? i) Type I error ii) Type II error iii) Correct conclusion d) (4 points) Comment on the quality of this study design and whether or not this study has shown that experiencing mono affects GPA. Provide a specific explanation, if appropriate. This is also an observational study (we cannot ethically randomize people to have mono or not), so there may be a confounding variable at the root of the lowered GPA for these students. One such example: these students may be partiers, which put them at a higher risk of getting mono, and the partying may be why they have a lower GPA (not the mono). - END OF EXAM - (remaining pages are tables)