Discussion Section 4 ECON 139/ Summer Term II

Transcription

1 Discussion Section 4 ECON 139/ Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase by approximately 0.15 years in distance to the nearest college decreased by 20 miles. Run a regression of years of completed education (ED) on distance to the nearest college (Dist). Is the advocacy group s claim consistent with the estimated regression? Explain. Solution: the regression model: ED = β 0 + β 1 dist + u the predicted change in ED when dist changes by dist: ED = β 1 dist the argument we want to test: 0.15 = β 1 ( 2) (Note: dist in 10 miles) the null hypothesis: H 0 : β 1 = use "D:\econ139\collegedistance.dta", clear. reg ed dist, robust F( 1, 3794) = R-squared = Root MSE = dist _cons test dist= ( 1) dist = F( 1, 3794) = 0.01 Prob > F = We cannot reject H 0. The advocacy group s claim is consistent with the estimated regression.

2 (b) Other factors also affect how much college a person completes. Does controlling for these other factors change the estimated effect of distance on college years completed? For example, run a regression of ED on Dist, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. Solution:. reg ed dist female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = R-squared = Root MSE = dist female black hispanic bytest dadcoll momcoll ownhome cue stwmfg tuition incomehi _cons (c) It has been argued that, controlling for other factors, blacks and Hispanics complete more college than whites. Is this consistent with the regressions that you constructed in part (b)? Page 2

3 Solution:. test black hispanic ( 1) black = 0 ( 2) hispanic = 0 F( 2, 3783) = test black= hispanic ( 1) black - hispanic = 0 F( 1, 3783) = 0.02 Prob > F = The coefficients on blacks and Hispanics are individually significant and jointly significant. They are also positve, so blacks and Hispanics complete more college than whites, holding other factors constant. We can also test if these effects are equal. We cannot reject the null hypothesis that the two coefficients are equal. (d) Test whether β tuition = β ownhome = 0. Solution:. test tuition ownhome ( 1) tuition = 0 ( 2) ownhome = 0 F( 2, 3783) = 4.42 Prob > F = We can reject the null at 5% significance level, but cannot reject the null at 1% significance level. (e) If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Page 3

4 Solution: Since the model is linear in Dist, the marginal effect of Dist on ED is constant, If Dist increases from 20 miles to 30 miles, ED is expected to decrease by If Dist increases from 60 miles to 70 miles, ED is expected to decrease by (f) Run a regression of ED on Dist, Dist 2, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Solution:. gen dist2=dist^2. reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = R-squared = Root MSE = dist dist female black hispanic bytest dadcoll momcoll ownhome cue stwmfg tuition Page 4

5 incomehi _cons dis -.081* *3^2-(-.081* *2^2) dis -.081* *7^2-(-.081* *6^2) (g) Do you prefer the regression that is linear in Dist or the one that is quadratic in Dist? (h) Consider a Hispanic female with T uition = $950, Bytest = 58, Incomehi = 0, Ownhome = 0, DadColl = 1, MomColl = 1, Cue80 = 7.1, and Stwmfg80 = $ Plot the regression relation between Dist and ED for Dist in the range of 0 to 100 miles. Describe the similarities and differences between the estimated regression functions. Would your answer change if you plotted the regression function for a white male with the same characteristics? Solution: Generate one more observation:. edit - preserve - set obs replace female = 1 in replace black = 0 in replace hispanic = 1 in replace bytest = 58 in replace dadcoll = 1 in replace momcoll = 1 in replace ownhome = 0 in replace cue80 = 7.1 in replace stwmfg80 = in replace dist = 0 in replace dist2 = 0 in replace tuition =.950 in replace incomehi = 0 in 3797 Then, predict the value for the new observation when Dist = 0.. reg ed dist female black hispanic bytest dadcoll momcoll Page 5

6 ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = R-squared = Root MSE = dist female black hispanic bytest dadcoll momcoll ownhome cue stwmfg tuition incomehi _cons predict ed_hat_linear (option xb assumed; fitted values). reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = R-squared = Root MSE = Page 6

7 dist dist female black hispanic bytest dadcoll momcoll ownhome cue stwmfg tuition incomehi _cons predict ed_hat_quad (option xb assumed; fitted values) Have a look at the predicted value for the new observation:. count list if _n==3797 ED h at l inear = ED h at q uad = Plot the regression relation between Dist and ED:. twoway (function y_quad= *x *x^2, range(0 10)) (function y_linear= *x, range(0 10)) For a white male with the same characteristics: only the intercept changes, the slopes remain the same. (i) Add the interaction term DadColl M omcoll to the regression. What does the coefficient on the interaction term measure? Page 7

8 Solution:. gen dadmom= dadcoll* momcoll. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 14, 3781) = R-squared = Root MSE = dist dist female black hispanic bytest dadcoll dadmom momcoll ownhome cue stwmfg tuition incomehi _cons (j) Is there any evidence that the effect of Dist on ED depends on the family s income? Solution:. gen incdist= incomehi*dist Page 8

9 . gen incdist2= incomehi*dist2. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi incdist incdist2, robust F( 16, 3779) = R-squared = Root MSE = dist dist female black hispanic bytest dadcoll dadmom momcoll ownhome cue stwmfg tuition incomehi incdist incdist _cons test incdist incdist2 ( 1) incdist = 0 Page 9

10 ( 2) incdist2 = 0 F( 2, 3779) = 2.34 Prob > F = On the course website you will find a dataset (pntsprd.csv) containing data on the Las Vegas point spreads for 553 men s college basketball games from the season. The variable favwin is a binary variable that equals 1 if the team favored by the Las Vegas spread wins. The variable spread measures the amount by which the favored team is expected to win. (a) A linear probability model to estimate the probability that the favored team wins is P (favwin = 1 spread) = β 0 + β 1 spread Explain why, if the spread incorporates all relevant information, we expect β 0 =.5. (Hint: if we think that the predicted point spread in the game is zero, spread = 0, then what should that say about the chances that our team is going to win?) Solution: If spread is zero, there is no favorite, and the probability that the team we (arbitrarily) label the favorite should have a 50% chance of winning. (b) Estimate the model from part a) by OLS. Test H 0 : β 0 =.5 against a two-sided alternative. Solution: The linear probability model estimated by OLS yields:. reg favwin spread, robust Linear regression Number of obs = 553 F( 1,551) = R-squared = Root MSE = favwin Coef. Std. Err. t P> t Page 10

11 spread _cons Using the robust standard error leads to strong rejection of H 0 at the 2% level against a two-sided alternative: t = = (c) Is the spread statistically significant? What is the estimated probability that the favored team wins when spread = 10? Solution: As we expect, spread is very statistically significant, with t = If spread = 10 the estimated probability that the favored team wins is (10) =.771. (d) Now estimate a probit model for P (favwin = 1 spread). Interpret and test the null hypothesis that the intercept is zero. Solution: The probit results are:. probit favwin spread Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Probit estimates Number of obs = 553 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = favwin Coef. Std. Err. z P> z spread _cons In the Probit model P (favwin = 1 spread) = Φ (β 0 + β 1 spread) Page 11

12 where Φ ( ) denotes the standard normal cdf, if β 0 = 0 then P (favwin = 1 spread) = Φ (β 1 spread) and, in particular, P (favwin = 1 spread = 0) = Φ (0) =.5. This is the analog of testing whether the intercept is.5 in the LPM. The t-statistic for testing H 0 : β 0 = 0 is only about.102, so we do not reject H 0. (e) Use the probit model to estimate the probability that the favored team wins when spread = 10. Compare this with the LPM estimate from part c). Solution: When spread = 10 the predicted response probability from the estimated probit model is Φ ( (10)) = Φ (.9144) =.820 This is somewhat above the estimate for the LPM. (f) Repeat only part e) using a logit model. Solution: The logit results are. logit favwin spread Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Logit estimates Number of obs = 553 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = favwin Coef. Std. Err. z P> z spread _cons Page 12

13 When spread = 10 the predicted response probability from the estimated logit model is F ( (10)) = e1.56 = e1.56 This is somewhat above both the estimate for the LPM and the probit. Page 13