ST 311 Evening Problem Session Solutions Week 11

1. p. 175, Question 32 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14, J17] Since 1980, average mortgage rates have fluctuated from a low of under 6% to a high of over 14%. Is there a relationship between the amount of money people borrow and the interest rate that s offered? Here is a scatterplot of Total Mortgages in the United States (in millions of 2005 dollars) versus Interest Rate at various times over the past 26 years. R-Squared 0.706 Reg Equation a) Identify the dependent and independent variables. b) Interpret the meaning of R 2 in this context? T otalm ortgages = 220.89 7.78 InterestRate c) Find the correlation coefficient. If we were to measure Total Mortgages in thousands of dollars instead of millions of dollars, how would the correlation coefficient change? d) Do these data provide proof that if mortgage rates are lowered, the total mortgage amount that people will take out will increase? Explain. e) Interpret the meaning of the slope of the regression line in this context. f) Suppose we discovered a missing measurement that recorded the Total Mortgages as $180 million for an interest rate of 14%. What would its predicted value and residual value be? How will this value affect the slope of our line? What would you expect to happen to the R 2 value? Explain. Solutions: a) The dependent variable is Total Mortgages and the independent variable is Interest Rates b) The R 2 of 0.706 means that Interest Rates account for 70.6% of the variation in Total Mortgages Page 1

c) First make a not of the sign of the slope in this problem. Our slope here is 7.78, so we need to take the negative square root of R 2. The correlation coefficient is r = 0.706 = 0.840 d) These data do not indicate causation. Instead, the relationship here is one of correlation. So, it would just appear that the interest rates are correlated with the amount of mortgages. To imply causation, we would have needed to do an experiment. e) The slope of 7.78 indicates that when interest rates increase by 1%, we would expect an average decrease in total mortgages of $7.78 million. f) The predicted value would be and the associated residual value would be ŷ = 220.89 7.78(14) = 111.97 Residual = Observed P redicted = 180 111.97 = $68.03 million. This value will make our regression line flatter, so the new slope would be closer to 0. The R 2 value should decrease. 2. p. 204, Question 30 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14] Here is a scatter plot of the number of wins by American League baseball teams and the average attendance at their home games for the 2006 season, and part of the regression analysis. R-Squared 0.485 Reg Equation HomeAttendance = 14364.5 + 538.915 W ins. Page 2

a) Identify the dependent and independent variables. b) Interpret the meaning of R 2 in this context. c) Find the correlation coefficient. d) Estimate the Average Attendance for a team with 72 Wins e) Interpret the meaning of the slope of the regression line in this context. f) The St. Louis Cardinals, the 2006 World Champions, are not included in these data because they are a National League team. During the 2006 regular season, the Cardinals won 83 games and averaged 42,588 fans at their home games. Calculate the residual for this team, and explain what it means. How will this value impact the slope of the line. Explain. Solutions a) The dependent variable is Home Attendance and the independent variable is Wins b) The R 2 of 0.485 means that Wins account for 48.5% of the variation in Home Attendance c) First make a not of the sign of the slope in this problem. Our slope here is 538.915, so we need to take the positive square root of R 2. The correlation coefficient is d) ŷ = 14364.5 + 538.915(72) = 24437.4 r = 0.485 = 0.696 e) The slope of 538.915 indicates that for every additional win, we would expect the average home attendance to increase by 538.915 people. f) The predicted value would be and the associated residual value would be ŷ = 14364.5 + 538.915(83) = 30365.4 Residual = Observed P redicted = 42588 30365.5 = 12222.6 This residual value means that the Cardinals had an average attendance of 12,222.6 people higher than we would expect given the number of win in their season. If we included this value in our data set, it will make the slope increase. 3. p. 235, Question 30 (Modules 10.1-10.4) [Learning Objectives J1, J9, J11-12, J14] Information was gathered about the condition and ages bridges of Tompkins County, NY built since 1880. Below you can find the corresponding scatterplot and some simple linear regression output from StatCrunch. Page 3

R-Squared 0.518 Reg. Equation a) Identify the dependent and independent variables. Condition = 44.991 + 0.0256 year b) Interpret the meaning of the slope of the regression line in this context. c) Tompkins County is the home of the oldest covered bridge in daily use in New York. Built in 1853, it is judged to have a condition of 4.523. If we use this regression to predict the condition of the covered bridge, what would its predicted value and residual value be? d) How do you think this will impact the regression slope? Explain. e) If we add the covered bridge (from c) to the data, what would you expect to happen to the R 2 value? Explain. f) The Tompkins County bridge (from c) was extensively restored in 1972. If we use that date instead of 1853, do you find the condition of the bridge remarkable? Solutions: (a) The dependent variable is Condition and the independent variable is Year (b) The slope of 0.0256 indicates that as we move later by one year (i.e. 1940 to 1941), we would expect the average condition increase by 0.0256. (c) The predicted value would be ŷ = 44.991 + 0.0256(1853) = 2.4458 and the associated residual value would be Residual = Observed P redicted = 4.523 2.4458 = 2.0772 Page 4

(d) This value will make our regression line flatter, so the new slope would be closer to 0. (e) If we add the bridge to the data, we would expect the R 2 value to decrease because we would be adding an outlier into the data set, so the line would fit worse. (f) No, if we consider the year as 1972, then we would predict the condition to be ŷ = 44.991 + 0.0256(1972) = 5.4922 Which has a residual value of a little less than 1. While the observed value (of 4.523) is somewhat different than the predicted value (of 5.4922), if you look at the other bridges built in 1972, it is not unusual to see bridges with condition numbers around 4.5. 4. Additional Question 1 (Modules 10.1-10.4) [Learning Objectives J2, J4, J8, 16] The following partial regression output explores the relationship between shoe size and height (in inches). Simple linear regression results: Dependent Variable: Height Independent Variable: shoe Sample size: 389 R (correlation coefficient) = 0.8869 R-sq = 0.78655535 Estimate of error standard deviation: 1.9304528 Parameter Estimates: Page 5

Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept 50.711956 0.45652923 0 387 111.08151 < 0.0001 Slope 1.8122975 0.04799014 0 387 37.763954 < 0.0001 (a) Describe the relationship between shoe size and height shown in the scatterplot. When commenting on the strength of the relationship, include a specific number from the output that is used to determine if the relationship is strong or weak. (b) What is the equation of the regression model (report values to 2 decimal places)? (c) Would it be appropriate to use the model from part (b) to predict the height of a person with a size 4 shoe? Explain why or why not. Solutions: (a) Overall, there is a strong, positive linear relationship with no obvious outliers. The relationship is strong since r = 0.8869, which is close to 1. (Note: you could also report that R 2 is close to 1: R 2 = 0.7866.) (b) ŷ = 50.71 + 1.81x (c) No, since that is beyond the range of the data we have and we dont know if the relationship remains the same as what we see here. Page 6