Applied Data Analysis Fall 2015
Course information: Labs Anna Walsdorff anna.walsdorff@rochester.edu Tues. 9-11 AM Mary Clare Roche maryclare.roche@rochester.edu Mon. 2-4 PM
Lecture outline 1. Practice questions 2. Inference and regression
Question 1 For women age 25-45 in the U.S. in 2005, with full-time jobs, the relationship between education (years of schooling completed) and personal income (dollars) can be summarized as follows: Education Income Mean 14.0 32,000 St. Dev. 2.4 26,000 Estimate the average income of those women who have finished high school, but have not gone on to college (12 years of education). The correlation is 0.34.
z-table
Question 1 answer 12 14 2.4 = 0.8333
Question 1 answer 12 14 2.4 = 0.8333 0.34 0.833 = 0.28
Question 1 answer 12 14 2.4 = 0.8333 0.34 0.833 = 0.28 0.28 $26, 000 + $32, 000 = $24, 720
Question 2 For the first-year students at a certain university, the correlation between SAT scores and first-year GPA was 0.60. The scatter diagram is football-shaped. Predict the percentile rank for the first-year GPA for a student whose percentile rank on the SAT was 1. 90% 2. 30% 3. 50% 4. unknown
z-table
Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.
Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%.
Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%. 3. 50%
Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%. 3. 50% 4. 50%
Question 3 As part of their training, air force pilots make two practice landings with instructors and are rated on performance. The instructors discuss the ratings with the pilots after each landing. Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time. Conversely, pilots who make good landings the first time tend to do worse the second time. The conclusion: criticism helps the pilots while praise makes them perform worse. As a result, instructors were ordered to criticize all landings, good or bad. Was this policy warranted by the facts?
Question 3 answer No, the air force is guilty of making the regression fallacy. The results are probably due to the regression effect.
Question 4 An admissions officer is trying to choose between two methods of predicting first-year scores. One method has an r.m.s. error of 12. The other has an r.m.s. error of 7. Other things being equal, which should she choose? Why?
Question 4 answer The one with the smaller r.m.s. error because it will be more accurate.
Question 5 At a certain college, the first-year GPAs average about 3.0, with a SD of about 0.5; they are correlated at about 0.6 with high-school GPA. Person A predicts first-year GPAs just using the average. Person B predicts first-year GPAs by regression, using the high-school GPAs. Which person makes the smaller r.m.s. error? Smaller by what factor?
Question 5 answer Person B, who uses more information.
Question 5 answer Person B, who uses more information. The r.m.s. will be smaller by a factor of 1 r 2 = 1 0.6 2 = 0.8
Question 6 Pearson and Lee obtained the following results for about 1,000 families: r = 0.25 Husband height Wife height Mean 68.0 63.0 St. Dev. 2.7 2.5 1. What percentage of the women were over 5 8? 2. Of the women who were married to men of height 6 feet, what percentage were over 5 8?
z-table
Question 6 answer 1. 68 63 2.5 = 2 2.28%.
Question 6 answer 1. 68 63 2.5 = 2 2.28%. 2. 72 68 2.7 = 1.48 1.48 0.25 = 0.37 0.37 2.5 + 63 = 63.9 68 63.9 2.5 = 1.64 5%
Inference Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µ x and x. The true regression line is y i = β 0 + β 1 x i + ɛ i
Inference Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µ x and x. The true regression line is y i = β 0 + β 1 x i + ɛ i The estimated regression line is ŷ i = a + bx i = ˆβ 0 + ˆβ 1 x i
Return of the hypothesis test b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests. H 0 : β = 0 H 1 : β 0
Return of the hypothesis test b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests. H 0 : β = 0 H 1 : β 0 Now we just need to calculate a standard error.
Have to take my word for it The standard error for the regression coefficient is s b = b 1 r 2 r n 2 = s y 1 r 2 s x n 2
The test statistic is distributed as a t with n 2 degrees of freedom. b s b t n 2
Is the effect of income on contacts real? Contacts Income Mean 3.60 4.230 St. Dev. 2.27 3.328 b = 0.78 2.27 3.328 = 0.532 a = 3.60 0.532(4.32) = 1.3
The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151
The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151 t = 0.532 0.151 = 3.47
The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151 t = 0.532 0.151 = 3.47 The p-value is approximately 0, and we reject the null hypothesis.
What did we learn? If you had majored in international relations, you would not need this class.