Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase by approximately 0.15 years in distance to the nearest college decreased by 20 miles. Run a regression of years of completed education (ED) on distance to the nearest college (Dist). Is the advocacy group s claim consistent with the estimated regression? Explain. Solution: the regression model: ED = β 0 + β 1 dist + u the predicted change in ED when dist changes by dist: ED = β 1 dist the argument we want to test: 0.15 = β 1 ( 2) (Note: dist in 10 miles) the null hypothesis: H 0 : β 1 = 0.075. use "D:\econ139\collegedistance.dta", clear. reg ed dist, robust F( 1, 3794) = 29.83 R-squared = 0.0074 Root MSE = 1.8074 - -------------+----------------------------------- dist -.0733727.0134334-5.46 _cons 13.95586.0378112 369.09 --. test dist=-0.075 ( 1) dist = -.075 F( 1, 3794) = 0.01 Prob > F = 0.9036 We cannot reject H 0. The advocacy group s claim is consistent with the estimated regression.
(b) Other factors also affect how much college a person completes. Does controlling for these other factors change the estimated effect of distance on college years completed? For example, run a regression of ED on Dist, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. Solution:. reg ed dist female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = 168.48 R-squared = 0.2836 Root MSE = 1.5378 -------------+---------------------------------- dist -.0366613.0120749-3.04 female.1429742.0502718 2.84 black.3506095.0674301 5.20 hispanic.3617649.0764184 4.73 bytest.0930377.003014 30.87 dadcoll.5709712.0763028 7.48 momcoll.3778102.0834999 4.52 ownhome.1385475.0649795 2.13 cue80.0286753.0095229 3.01 stwmfg80 -.0425003.0199355-2.13 tuition -.1910519.0985259-1.94 incomehi.3718305.0622177 5.98 _cons 8.920823.2434585 36.64 ----------------------------------------------- (c) It has been argued that, controlling for other factors, blacks and Hispanics complete more college than whites. Is this consistent with the regressions that you constructed in part (b)? Page 2
Solution:. test black hispanic ( 1) black = 0 ( 2) hispanic = 0 F( 2, 3783) = 19.23. test black= hispanic ( 1) black - hispanic = 0 F( 1, 3783) = 0.02 Prob > F = 0.8969 The coefficients on blacks and Hispanics are individually significant and jointly significant. They are also positve, so blacks and Hispanics complete more college than whites, holding other factors constant. We can also test if these effects are equal. We cannot reject the null hypothesis that the two coefficients are equal. (d) Test whether β tuition = β ownhome = 0. Solution:. test tuition ownhome ( 1) tuition = 0 ( 2) ownhome = 0 F( 2, 3783) = 4.42 Prob > F = 0.0120 We can reject the null at 5% significance level, but cannot reject the null at 1% significance level. (e) If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Page 3
Solution: Since the model is linear in Dist, the marginal effect of Dist on ED is constant, -0.037. If Dist increases from 20 miles to 30 miles, ED is expected to decrease by 0.037. If Dist increases from 60 miles to 70 miles, ED is expected to decrease by 0.037. (f) Run a regression of ED on Dist, Dist 2, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Solution:. gen dist2=dist^2. reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = 155.93 R-squared = 0.2844 Root MSE = 1.5372 ----------------------------------------------- -------------+---------------------------------- dist -.0811732.0251112-3.23 dist2.0046413.0020542 2.26 female.1433144.0502511 2.85 black.3339309.0683045 4.89 hispanic.3333104.0778789 4.28 bytest.0926367.0030243 30.63 dadcoll.5611581.0765802 7.33 momcoll.3777022.0835025 4.52 ownhome.14327.0648817 2.21 cue80.0259537.009587 2.71 stwmfg80 -.0425539.0199267-2.14 tuition -.1928193.0985524-1.96 Page 4
incomehi.3694975.0623003 5.93 _cons 9.012167.2498793 36.07. dis -.081*3+0.0046*3^2-(-.081*2+0.0046*2^2) -.058. dis -.081*7+0.0046*7^2-(-.081*6+0.0046*6^2) -.0212 (g) Do you prefer the regression that is linear in Dist or the one that is quadratic in Dist? (h) Consider a Hispanic female with T uition = $950, Bytest = 58, Incomehi = 0, Ownhome = 0, DadColl = 1, MomColl = 1, Cue80 = 7.1, and Stwmfg80 = $10.06. Plot the regression relation between Dist and ED for Dist in the range of 0 to 100 miles. Describe the similarities and differences between the estimated regression functions. Would your answer change if you plotted the regression function for a white male with the same characteristics? Solution: Generate one more observation:. edit - preserve - set obs 3797 - replace female = 1 in 3797 - replace black = 0 in 3797 - replace hispanic = 1 in 3797 - replace bytest = 58 in 3797 - replace dadcoll = 1 in 3797 - replace momcoll = 1 in 3797 - replace ownhome = 0 in 3797 - replace cue80 = 7.1 in 3797 - replace stwmfg80 = 10.06 in 3797 - replace dist = 0 in 3797 - replace dist2 = 0 in 3797 - replace tuition =.950 in 3797 - replace incomehi = 0 in 3797 Then, predict the value for the new observation when Dist = 0.. reg ed dist female black hispanic bytest dadcoll momcoll Page 5
ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = 168.48 R-squared = 0.2836 Root MSE = 1.5378 -------------+---------------------------------- dist -.0366613.0120749-3.04 female.1429742.0502718 2.84 black.3506095.0674301 5.20 hispanic.3617649.0764184 4.73 bytest.0930377.003014 30.87 dadcoll.5709712.0763028 7.48 momcoll.3778102.0834999 4.52 ownhome.1385475.0649795 2.13 cue80.0286753.0095229 3.01 stwmfg80 -.0425003.0199355-2.13 tuition -.1910519.0985259-1.94 incomehi.3718305.0622177 5.98 _cons 8.920823.2434585 36.64. predict ed_hat_linear (option xb assumed; fitted values). reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = 155.93 R-squared = 0.2844 Root MSE = 1.5372 Page 6
-------------+---------------------------------- dist -.0811732.0251112-3.23 dist2.0046413.0020542 2.26 female.1433144.0502511 2.85 black.3339309.0683045 4.89 hispanic.3333104.0778789 4.28 bytest.0926367.0030243 30.63 dadcoll.5611581.0765802 7.33 momcoll.3777022.0835025 4.52 ownhome.14327.0648817 2.21 cue80.0259537.009587 2.71 stwmfg80 -.0425539.0199267-2.14 tuition -.1928193.0985524-1.96 incomehi.3694975.0623003 5.93 _cons 9.012167.2498793 36.07. predict ed_hat_quad (option xb assumed; fitted values) Have a look at the predicted value for the new observation:. count 3797. list if _n==3797 ED h at l inear = 15.36507 ED h at q uad = 15.37358 Plot the regression relation between Dist and ED:. twoway (function y_quad=15.3736-0.081*x+0.0046*x^2, range(0 10)) (function y_linear=15.365-0.0366*x, range(0 10)) For a white male with the same characteristics: only the intercept changes, the slopes remain the same. (i) Add the interaction term DadColl M omcoll to the regression. What does the coefficient on the interaction term measure? Page 7
Solution:. gen dadmom= dadcoll* momcoll. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 14, 3781) = 145.73 R-squared = 0.2854 Root MSE = 1.5363 -------------+---------------------------------- dist -.0810001.025094-3.23 dist2.0046773.0020564 2.27 female.1406184.0502133 2.80 black.3305619.0683148 4.84 hispanic.3297465.0779131 4.23 bytest.0925664.0030234 30.62 dadcoll.6538031.087084 7.51 dadmom -.3664802.1639813-2.23 momcoll.5693549.1218052 4.67 ownhome.1412131.0649487 2.17 cue80.0257697.00959 2.69 stwmfg80 -.0415432.0199035-2.09 tuition -.1939714.0985584-1.97 incomehi.3623156.0622537 5.82 _cons 9.00197.2500197 36.01 (j) Is there any evidence that the effect of Dist on ED depends on the family s income? Solution:. gen incdist= incomehi*dist Page 8
. gen incdist2= incomehi*dist2. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi incdist incdist2, robust F( 16, 3779) = 128.72 R-squared = 0.2863 Root MSE = 1.5357 ----------------------------------------------- -------------+---------------------------------- dist -.1095309.0281269-3.89 dist2.0064744.0022177 2.92 female.141463.0501943 2.82 black.333128.0684285 4.87 hispanic.3230637.0777508 4.16 bytest.0927566.0030201 30.71 dadcoll.6627368.0870109 7.62 dadmom -.3556964.1642177-2.17 momcoll.5674681.1219911 4.65 ownhome.1437389.0649888 2.21 cue80.0260482.0095869 2.72 stwmfg80 -.0419249.0198822-2.11 tuition -.2099784.0991537-2.12 incomehi.2172968.0897228 2.42 incdist.1244186.0620106 2.01 incdist2 -.008659.006246-1.39 _cons 9.042179.2508048 36.05. test incdist incdist2 ( 1) incdist = 0 Page 9
( 2) incdist2 = 0 F( 2, 3779) = 2.34 Prob > F = 0.0966 2. On the course website you will find a dataset (pntsprd.csv) containing data on the Las Vegas point spreads for 553 men s college basketball games from the 1994-1995 season. The variable favwin is a binary variable that equals 1 if the team favored by the Las Vegas spread wins. The variable spread measures the amount by which the favored team is expected to win. (a) A linear probability model to estimate the probability that the favored team wins is P (favwin = 1 spread) = β 0 + β 1 spread Explain why, if the spread incorporates all relevant information, we expect β 0 =.5. (Hint: if we think that the predicted point spread in the game is zero, spread = 0, then what should that say about the chances that our team is going to win?) Solution: If spread is zero, there is no favorite, and the probability that the team we (arbitrarily) label the favorite should have a 50% chance of winning. (b) Estimate the model from part a) by OLS. Test H 0 : β 0 =.5 against a two-sided alternative. Solution: The linear probability model estimated by OLS yields:. reg favwin spread, robust Linear regression Number of obs = 553 F( 1,551) = 101.54 R-squared = 0.1107 Root MSE =.40168 ------ favwin Coef. Std. Err. t P> t -------------+---------------------------------------- Page 10
spread.0193655.0019218 10.08 0.000 _cons.5769492.0316568 18.23 0.000 ------ Using the robust standard error leads to strong rejection of H 0 at the 2% level against a two-sided alternative: t =.577.5 = 2.41..032 (c) Is the spread statistically significant? What is the estimated probability that the favored team wins when spread = 10? Solution: As we expect, spread is very statistically significant, with t = 10.07. If spread = 10 the estimated probability that the favored team wins is.577 +.0194(10) =.771. (d) Now estimate a probit model for P (favwin = 1 spread). Interpret and test the null hypothesis that the intercept is zero. Solution: The probit results are:. probit favwin spread Iteration 0: log likelihood = -302.74988 Iteration 1: log likelihood = -266.49244 Iteration 2: log likelihood = -263.62542 Iteration 3: log likelihood = -263.56223 Iteration 4: log likelihood = -263.56219 Probit estimates Number of obs = 553 LR chi2(1) = 78.38 Prob > chi2 = 0.0000 Log likelihood = -263.56219 Pseudo R2 = 0.1294 ------- favwin Coef. Std. Err. z P> z -------------+----------------------------------------- spread.092463.0121811 7.59 0.000 _cons -.0105926.1037469-0.10 0.919 ------- In the Probit model P (favwin = 1 spread) = Φ (β 0 + β 1 spread) Page 11
where Φ ( ) denotes the standard normal cdf, if β 0 = 0 then P (favwin = 1 spread) = Φ (β 1 spread) and, in particular, P (favwin = 1 spread = 0) = Φ (0) =.5. This is the analog of testing whether the intercept is.5 in the LPM. The t-statistic for testing H 0 : β 0 = 0 is only about.102, so we do not reject H 0. (e) Use the probit model to estimate the probability that the favored team wins when spread = 10. Compare this with the LPM estimate from part c). Solution: When spread = 10 the predicted response probability from the estimated probit model is Φ (.0106 +.0925(10)) = Φ (.9144) =.820 This is somewhat above the estimate for the LPM. (f) Repeat only part e) using a logit model. Solution: The logit results are. logit favwin spread Iteration 0: log likelihood = -302.74988 Iteration 1: log likelihood = -268.51377 Iteration 2: log likelihood = -264.1308 Iteration 3: log likelihood = -263.90218 Iteration 4: log likelihood = -263.90131 Iteration 5: log likelihood = -263.90131 Logit estimates Number of obs = 553 LR chi2(1) = 77.70 Prob > chi2 = 0.0000 Log likelihood = -263.90131 Pseudo R2 = 0.1283 ------- favwin Coef. Std. Err. z P> z -------------+----------------------------------------- spread.1632261.0225567 7.24 0.000 _cons -.071157.1732172-0.41 0.681 Page 12
------- When spread = 10 the predicted response probability from the estimated logit model is F (.0712 +.1632(10)) = e1.56 =.8265 1 + e1.56 This is somewhat above both the estimate for the LPM and the probit. Page 13