STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won for the n = 203 countries that participated in the 2008 Summer Olympics in Beijing, along with the detailed summary statistics for this variable: a. (5 points) Is this distribution symmetric, left-skewed, or right-skewed? How do you know? The distribution is right-skewed (definitely not symmetric), which can be seen from the long right tail b. (5 points) The mean for these data is 4.70. Give a reasonable guess of the median for these data. 0, 1, 2, or 3. It should be less than or equal to 3, which is the 3 rd quartile, and greater than or equal to 0, the first quartile. (It should also be a whole number). c. (6 points) Based on the rule we used in class and in your text, are there any potential low or high outliers in the dataset? Show your work. IQR=3 Upper Q3+1.5*IQR = 3+4.5 = 7.5 Lower=Q1-1.5*IQR = 0-4.5 = -4.5 There are upper outliers (and the max is 110 medals, well above the limit) but no lower ones.
STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 2 OF 5 2. (16 points total) The following questions are multiple choice and DO NOT require any explanation or for you to show your work. Note: they are unrelated to each other. 2 a. (4 points) If the coefficient of determination ( R ) is 0.975 in a simple regression, then which of the following is true regarding the slope of the regression line? a) All we can tell is that it must be positive. b) It must be 0.975 c) It must be 0.987 d) Cannot tell the sign or the value. b. (4 points) Heights of college women have a distribution that can be approximated by a normal curve with a mean of 65 inches and a standard deviation equal to 3 inches. About what proportion of college women are between 65 and 67 inches tall? a) 0.75 b) 0.50 c) 0.25 d) 0.17 c. (4 points) Consider the annual salaries of mutual fund managers in the Boston area. The mean salary is $450,000 and the median salary is $380,000. Circle the correct answer below. The probability that the salary of a randomly selected mutual fund manager from the Boston area is larger than the mean of $450,000 is (Circle the appropriate answer): a) 0.5 b) = 0.5 c) 0.5 d) Cannot be determined
STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 3 OF 5 3. (20 points total) An observational study collected the monthly unemployment rate in the entire US (unemployment: in percentage points, ranging from 4.4% to 10%) along with the monthly inflation rate in the entire US (inflation: in percentage points change per month, ranging from - 1.92 to 1.22%). These data were taken from January 2003 until May 2012 (n = 113). The result of the regression is shown below: a. (4 points) What is the correlation between inflation and unemployment? sqrt(0.010) = -0.10 (must be negative since the slope is negative) b. (4 points) What is the formula for the regression line to predict inflation from unemployment? Inflation = 0.363-0.022* unemployment c. (4 points) June had an unemployment rate of 8.2%. What is the predicted inflation rate for June using this model? 0.363-0.022*8.2=0.183 d. (4 points) June had an inflation rate of 0.31%. What is June s residual value? Y - Yhat = 0.31-0.183 = 0.127 e. (4 points) A governmental official sees the results of this regression and states that a good way to lower the inflation rate is to increase the unemployment rate. In one or two sentences, please comment on this official s statement. Causality is not the same as correlation. The regression result just shows linear correlation between inflation rate and unemployment rate. But we cannot draw a causal conclusion from that (since it is an observational study).
STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 4 OF 5 4. (12 points total) The mean length of stay in a hospital is useful for planning purposes. Suppose that the following is the distribution of the length of stay in a hospital after a minor operation. Number of Days 1 2 3 4 Probability 0.4 0.3 0.2 0.1 a. (4 points) Calculate the mean (aka, expected value) for the length of stay. Mean is E(X) = 0.4+0.3*2+0.2*3+0.1*4 = 2 b. (4 points) Calculate the standard deviation for the length of stay. Variance = 0.4*(1-2) 2 +0.3*(2-2) 2 +0.2*(3-2) 2 +0.1*(4-2) 2 =1 Standard deviation = sqrt(variance) = 1 c. (4 points) A new policy in the hospital will add exactly one day to the length of stay for this operation for every patient. What will be the new mean and new standard deviation after this new policy is put in place? Expected Value is E(X) + 1 = 3 Standard deviation does not change so it is still 1. 5. (21 points total) Michael Phelps and Ryan Lochte are 2 of the US s top swimmers, and they both will be swimming the 400 IM in the Olympics. Overall, Michael Phelps is known to have a 60% chance of winning the gold medal in the 400 IM. If Michael Phelps does not win the gold medal, Ryan Lochte has a 75% chance of winning the gold medal in the 400 IM. Overall, Ryan Lochte has a 30% chance of winning the Gold Medal in the 400 IM. Define: MP: the event Michael Phelps wins the Gold Medal in the 400 IM RL: the event Ryan Lochte wins the Gold Medal in the 400 IM a. (3 points) Express the event Michael Phelps wins the Gold Medal and Ryan Lochte does not win the Gold Medal in terms of the events defined above. MP and RL c MP RL c b. (5 points) What is the overall probability that neither Michael Phelps nor Ryan Lochte wins the Gold Medal (someone else wins it)? P(MP c and RL c ) = 1 P(MP) P(RL) + P(MP and RL) = 1-0.6-0.3 + 0 = 0.1 Similarly, P(MP c and RL c ) = P(RL c MP c )*P(MP c )= 0.25*0.4 = 0.1 c. (5 points) Given Ryan Lochte does not win the gold medal, what is the probability that Michael Phelps does win it? P(MP RL c ) = 0.6/0.7 = 0.857 d. (4 points) Are events MP and RL independent? How do you know? P(MP and RL) = 0 and P(MP)* P(RL) = 0.6*0.3 = 0.18. Since P(A and B) P(A)* P(B), we can say MP and RL are dependent (aka, not independent).
STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 5 OF 5 Similarly: P(RL MP c ) = 0.75 is not equal to P(RL) = 0.3. e. (4 points) Are events MP and RL disjoint? How do you know? Since P(RL and MP) = 0, this implies that they are disjoint. P(RL and MP) = P(RL) - P(RL and MP C ) = P(RL) - P(RL MP C )*P(MP C ) = 0.3 (0.75)*(0.4) = 0 6. (12 points) At the 2008 Summer Olympics many of the top swimmers wore Speedo s LZR Racer swim suit, believing it to help them reduce their race times. You have been asked to design a high quality study to determine if the LZR Racer Suit actually reduces race times relative to the classic swim suit used by swimmers. Thirty world class swimmers have agreed to participate in your study, 18 men and 12 women. Describe the important elements of your study design in outline or bulleted list format (you may include a figure/schema giving the outline of your study design if that helps explain your approach). Note: you will receive a higher grade on this question if you design a higher quality study. The best design for this study would be as a matched pairs study (using each swimmer as their own control) and also stratify the study by gender (since the classic suits for women are quite different than the classic suits for men). Each subject will swim once with the LZR Racer swim suit then swim the same distance the next day using the classic swim suit or in reverse order, randomly assigned to order. Their racing time using the LZR Racer swim suit will be subtracted from their time using the classic swim suit and compared. - The subjects will be swimmers (individuals). - A matched pairs study design (with each swimmer wearing the LZR Racer swimsuit and the classic swim suit) stratified by gender will be used. The swimmers will swim a fixed course one day using one suit and then the same course the next day using the other suit, in a random order (study design). - The subjects will be world class swimmer volunteers (individuals selected). - 20 swimmers will be entered into the study (sample size). - The racing time using the LZR Racer swim suit will be subtracted from the time using the classic swim suit. The average improvement in time will be compared (response variable).