Discussion Section 4 ECON 139/239 2010 Summer Term II



Similar documents
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Nonlinear Regression Functions. SW Ch 8 1/54/

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

MULTIPLE REGRESSION EXAMPLE

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Multinomial and Ordinal Logistic Regression

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

August 2012 EXAMINATIONS Solution Part I

Rockefeller College University at Albany

Handling missing data in Stata a whirlwind tour

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

International Statistical Institute, 56th Session, 2007: Phil Everson

Standard errors of marginal effects in the heteroskedastic probit model

Interaction effects between continuous variables (Optional)

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

From this it is not clear what sort of variable that insure is so list the first 10 observations.

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Linear Regression Models with Logarithmic Transformations

Chapter 9 Assessing Studies Based on Multiple Regression

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

25 Working with categorical data and factor variables

Title. Syntax. stata.com. fp Fractional polynomial regression. Estimation

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Econometrics I: Econometric Methods

11. Analysis of Case-control Studies Logistic Regression

Quick Stata Guide by Liz Foster

Chapter 18. Effect modification and interactions Modeling effect modification

Generalized Linear Models

SPSS Guide: Regression Analysis

SAS Software to Fit the Generalized Linear Model

Basic Statistical and Modeling Procedures Using SAS

From the help desk: Swamy s random-coefficients model

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Ordinal Regression. Chapter

Correlation and Regression

Regression with a Binary Dependent Variable

Pearson's Correlation Tests

Factors affecting online sales

Sample Size Calculation for Longitudinal Studies

Lecture 15. Endogeneity & Instrumental Variable Estimation

Module 14: Missing Data Stata Practical

Comparing Nested Models

From the help desk: hurdle models

ONLINE SPORTS GAMBLING: A LOOK INTO THE EFFICIENCY OF BOOKMAKERS ODDS AS FORECASTS IN THE CASE OF ENGLISH PREMIER LEAGUE

Using R for Linear Regression

Nested Logit. Brad Jones 1. April 30, University of California, Davis. 1 Department of Political Science. POL 213: Research Methods

Hypothesis testing - Steps

Estimation of σ 2, the variance of ɛ

2. Linear regression with multiple regressors

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Forecasting in STATA: Tools and Tricks

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

Correlated Random Effects Panel Data Models

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MARKET EFFECIENCY: IS THE NFL BETTING MARKET EFFICIENT? By: Alexander Kuper. Thesis Advisor: Professor Roger Craine

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Interaction between quantitative predictors

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Competing-risks regression

Does NFL Spread Betting Obey the E cient Market Hypothesis?

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Introduction to Time Series Regression and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting

outreg help pages Write formatted regression output to a text file After any estimation command: (Text-related options)

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Chapter 7: Simple linear regression Learning Objectives

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects

The following postestimation commands for time series are available for regress:

especially with continuous

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Section 6: Model Selection, Logistic Regression and more...

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

xtmixed & denominator degrees of freedom: myth or magic

Food Expenditures: The Effect of a Vegetarian Diet and Organic Foods

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Part 2: Analysis of Relationship Between Two Variables

Point Biserial Correlation Tests

Online Appendix The Earnings Returns to Graduating with Honors - Evidence from Law Graduates

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

The average hotel manager recognizes the criticality of forecasting. However, most

MODELING AUTO INSURANCE PREMIUMS

Transcription:

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase by approximately 0.15 years in distance to the nearest college decreased by 20 miles. Run a regression of years of completed education (ED) on distance to the nearest college (Dist). Is the advocacy group s claim consistent with the estimated regression? Explain. Solution: the regression model: ED = β 0 + β 1 dist + u the predicted change in ED when dist changes by dist: ED = β 1 dist the argument we want to test: 0.15 = β 1 ( 2) (Note: dist in 10 miles) the null hypothesis: H 0 : β 1 = 0.075. use "D:\econ139\collegedistance.dta", clear. reg ed dist, robust F( 1, 3794) = 29.83 R-squared = 0.0074 Root MSE = 1.8074 - -------------+----------------------------------- dist -.0733727.0134334-5.46 _cons 13.95586.0378112 369.09 --. test dist=-0.075 ( 1) dist = -.075 F( 1, 3794) = 0.01 Prob > F = 0.9036 We cannot reject H 0. The advocacy group s claim is consistent with the estimated regression.

(b) Other factors also affect how much college a person completes. Does controlling for these other factors change the estimated effect of distance on college years completed? For example, run a regression of ED on Dist, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. Solution:. reg ed dist female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = 168.48 R-squared = 0.2836 Root MSE = 1.5378 -------------+---------------------------------- dist -.0366613.0120749-3.04 female.1429742.0502718 2.84 black.3506095.0674301 5.20 hispanic.3617649.0764184 4.73 bytest.0930377.003014 30.87 dadcoll.5709712.0763028 7.48 momcoll.3778102.0834999 4.52 ownhome.1385475.0649795 2.13 cue80.0286753.0095229 3.01 stwmfg80 -.0425003.0199355-2.13 tuition -.1910519.0985259-1.94 incomehi.3718305.0622177 5.98 _cons 8.920823.2434585 36.64 ----------------------------------------------- (c) It has been argued that, controlling for other factors, blacks and Hispanics complete more college than whites. Is this consistent with the regressions that you constructed in part (b)? Page 2

Solution:. test black hispanic ( 1) black = 0 ( 2) hispanic = 0 F( 2, 3783) = 19.23. test black= hispanic ( 1) black - hispanic = 0 F( 1, 3783) = 0.02 Prob > F = 0.8969 The coefficients on blacks and Hispanics are individually significant and jointly significant. They are also positve, so blacks and Hispanics complete more college than whites, holding other factors constant. We can also test if these effects are equal. We cannot reject the null hypothesis that the two coefficients are equal. (d) Test whether β tuition = β ownhome = 0. Solution:. test tuition ownhome ( 1) tuition = 0 ( 2) ownhome = 0 F( 2, 3783) = 4.42 Prob > F = 0.0120 We can reject the null at 5% significance level, but cannot reject the null at 1% significance level. (e) If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Page 3

Solution: Since the model is linear in Dist, the marginal effect of Dist on ED is constant, -0.037. If Dist increases from 20 miles to 30 miles, ED is expected to decrease by 0.037. If Dist increases from 60 miles to 70 miles, ED is expected to decrease by 0.037. (f) Run a regression of ED on Dist, Dist 2, F emale, Black, Hispanic, Bytest, DadColl, M omcoll, Ownhome,Cue80, Stwmf g80, T uition and IncomeHi. If Dist increases from 20 miles to 30 miles, how are years of education expected to change? If Dist increases from 60 to 70 miles, how are years of education expected to change? Solution:. gen dist2=dist^2. reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = 155.93 R-squared = 0.2844 Root MSE = 1.5372 ----------------------------------------------- -------------+---------------------------------- dist -.0811732.0251112-3.23 dist2.0046413.0020542 2.26 female.1433144.0502511 2.85 black.3339309.0683045 4.89 hispanic.3333104.0778789 4.28 bytest.0926367.0030243 30.63 dadcoll.5611581.0765802 7.33 momcoll.3777022.0835025 4.52 ownhome.14327.0648817 2.21 cue80.0259537.009587 2.71 stwmfg80 -.0425539.0199267-2.14 tuition -.1928193.0985524-1.96 Page 4

incomehi.3694975.0623003 5.93 _cons 9.012167.2498793 36.07. dis -.081*3+0.0046*3^2-(-.081*2+0.0046*2^2) -.058. dis -.081*7+0.0046*7^2-(-.081*6+0.0046*6^2) -.0212 (g) Do you prefer the regression that is linear in Dist or the one that is quadratic in Dist? (h) Consider a Hispanic female with T uition = $950, Bytest = 58, Incomehi = 0, Ownhome = 0, DadColl = 1, MomColl = 1, Cue80 = 7.1, and Stwmfg80 = $10.06. Plot the regression relation between Dist and ED for Dist in the range of 0 to 100 miles. Describe the similarities and differences between the estimated regression functions. Would your answer change if you plotted the regression function for a white male with the same characteristics? Solution: Generate one more observation:. edit - preserve - set obs 3797 - replace female = 1 in 3797 - replace black = 0 in 3797 - replace hispanic = 1 in 3797 - replace bytest = 58 in 3797 - replace dadcoll = 1 in 3797 - replace momcoll = 1 in 3797 - replace ownhome = 0 in 3797 - replace cue80 = 7.1 in 3797 - replace stwmfg80 = 10.06 in 3797 - replace dist = 0 in 3797 - replace dist2 = 0 in 3797 - replace tuition =.950 in 3797 - replace incomehi = 0 in 3797 Then, predict the value for the new observation when Dist = 0.. reg ed dist female black hispanic bytest dadcoll momcoll Page 5

ownhome cue80 stwmfg80 tuition incomehi, robust F( 12, 3783) = 168.48 R-squared = 0.2836 Root MSE = 1.5378 -------------+---------------------------------- dist -.0366613.0120749-3.04 female.1429742.0502718 2.84 black.3506095.0674301 5.20 hispanic.3617649.0764184 4.73 bytest.0930377.003014 30.87 dadcoll.5709712.0763028 7.48 momcoll.3778102.0834999 4.52 ownhome.1385475.0649795 2.13 cue80.0286753.0095229 3.01 stwmfg80 -.0425003.0199355-2.13 tuition -.1910519.0985259-1.94 incomehi.3718305.0622177 5.98 _cons 8.920823.2434585 36.64. predict ed_hat_linear (option xb assumed; fitted values). reg ed dist dist2 female black hispanic bytest dadcoll momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 13, 3782) = 155.93 R-squared = 0.2844 Root MSE = 1.5372 Page 6

-------------+---------------------------------- dist -.0811732.0251112-3.23 dist2.0046413.0020542 2.26 female.1433144.0502511 2.85 black.3339309.0683045 4.89 hispanic.3333104.0778789 4.28 bytest.0926367.0030243 30.63 dadcoll.5611581.0765802 7.33 momcoll.3777022.0835025 4.52 ownhome.14327.0648817 2.21 cue80.0259537.009587 2.71 stwmfg80 -.0425539.0199267-2.14 tuition -.1928193.0985524-1.96 incomehi.3694975.0623003 5.93 _cons 9.012167.2498793 36.07. predict ed_hat_quad (option xb assumed; fitted values) Have a look at the predicted value for the new observation:. count 3797. list if _n==3797 ED h at l inear = 15.36507 ED h at q uad = 15.37358 Plot the regression relation between Dist and ED:. twoway (function y_quad=15.3736-0.081*x+0.0046*x^2, range(0 10)) (function y_linear=15.365-0.0366*x, range(0 10)) For a white male with the same characteristics: only the intercept changes, the slopes remain the same. (i) Add the interaction term DadColl M omcoll to the regression. What does the coefficient on the interaction term measure? Page 7

Solution:. gen dadmom= dadcoll* momcoll. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi, robust F( 14, 3781) = 145.73 R-squared = 0.2854 Root MSE = 1.5363 -------------+---------------------------------- dist -.0810001.025094-3.23 dist2.0046773.0020564 2.27 female.1406184.0502133 2.80 black.3305619.0683148 4.84 hispanic.3297465.0779131 4.23 bytest.0925664.0030234 30.62 dadcoll.6538031.087084 7.51 dadmom -.3664802.1639813-2.23 momcoll.5693549.1218052 4.67 ownhome.1412131.0649487 2.17 cue80.0257697.00959 2.69 stwmfg80 -.0415432.0199035-2.09 tuition -.1939714.0985584-1.97 incomehi.3623156.0622537 5.82 _cons 9.00197.2500197 36.01 (j) Is there any evidence that the effect of Dist on ED depends on the family s income? Solution:. gen incdist= incomehi*dist Page 8

. gen incdist2= incomehi*dist2. reg ed dist dist2 female black hispanic bytest dadcoll dadmom momcoll ownhome cue80 stwmfg80 tuition incomehi incdist incdist2, robust F( 16, 3779) = 128.72 R-squared = 0.2863 Root MSE = 1.5357 ----------------------------------------------- -------------+---------------------------------- dist -.1095309.0281269-3.89 dist2.0064744.0022177 2.92 female.141463.0501943 2.82 black.333128.0684285 4.87 hispanic.3230637.0777508 4.16 bytest.0927566.0030201 30.71 dadcoll.6627368.0870109 7.62 dadmom -.3556964.1642177-2.17 momcoll.5674681.1219911 4.65 ownhome.1437389.0649888 2.21 cue80.0260482.0095869 2.72 stwmfg80 -.0419249.0198822-2.11 tuition -.2099784.0991537-2.12 incomehi.2172968.0897228 2.42 incdist.1244186.0620106 2.01 incdist2 -.008659.006246-1.39 _cons 9.042179.2508048 36.05. test incdist incdist2 ( 1) incdist = 0 Page 9

( 2) incdist2 = 0 F( 2, 3779) = 2.34 Prob > F = 0.0966 2. On the course website you will find a dataset (pntsprd.csv) containing data on the Las Vegas point spreads for 553 men s college basketball games from the 1994-1995 season. The variable favwin is a binary variable that equals 1 if the team favored by the Las Vegas spread wins. The variable spread measures the amount by which the favored team is expected to win. (a) A linear probability model to estimate the probability that the favored team wins is P (favwin = 1 spread) = β 0 + β 1 spread Explain why, if the spread incorporates all relevant information, we expect β 0 =.5. (Hint: if we think that the predicted point spread in the game is zero, spread = 0, then what should that say about the chances that our team is going to win?) Solution: If spread is zero, there is no favorite, and the probability that the team we (arbitrarily) label the favorite should have a 50% chance of winning. (b) Estimate the model from part a) by OLS. Test H 0 : β 0 =.5 against a two-sided alternative. Solution: The linear probability model estimated by OLS yields:. reg favwin spread, robust Linear regression Number of obs = 553 F( 1,551) = 101.54 R-squared = 0.1107 Root MSE =.40168 ------ favwin Coef. Std. Err. t P> t -------------+---------------------------------------- Page 10

spread.0193655.0019218 10.08 0.000 _cons.5769492.0316568 18.23 0.000 ------ Using the robust standard error leads to strong rejection of H 0 at the 2% level against a two-sided alternative: t =.577.5 = 2.41..032 (c) Is the spread statistically significant? What is the estimated probability that the favored team wins when spread = 10? Solution: As we expect, spread is very statistically significant, with t = 10.07. If spread = 10 the estimated probability that the favored team wins is.577 +.0194(10) =.771. (d) Now estimate a probit model for P (favwin = 1 spread). Interpret and test the null hypothesis that the intercept is zero. Solution: The probit results are:. probit favwin spread Iteration 0: log likelihood = -302.74988 Iteration 1: log likelihood = -266.49244 Iteration 2: log likelihood = -263.62542 Iteration 3: log likelihood = -263.56223 Iteration 4: log likelihood = -263.56219 Probit estimates Number of obs = 553 LR chi2(1) = 78.38 Prob > chi2 = 0.0000 Log likelihood = -263.56219 Pseudo R2 = 0.1294 ------- favwin Coef. Std. Err. z P> z -------------+----------------------------------------- spread.092463.0121811 7.59 0.000 _cons -.0105926.1037469-0.10 0.919 ------- In the Probit model P (favwin = 1 spread) = Φ (β 0 + β 1 spread) Page 11

where Φ ( ) denotes the standard normal cdf, if β 0 = 0 then P (favwin = 1 spread) = Φ (β 1 spread) and, in particular, P (favwin = 1 spread = 0) = Φ (0) =.5. This is the analog of testing whether the intercept is.5 in the LPM. The t-statistic for testing H 0 : β 0 = 0 is only about.102, so we do not reject H 0. (e) Use the probit model to estimate the probability that the favored team wins when spread = 10. Compare this with the LPM estimate from part c). Solution: When spread = 10 the predicted response probability from the estimated probit model is Φ (.0106 +.0925(10)) = Φ (.9144) =.820 This is somewhat above the estimate for the LPM. (f) Repeat only part e) using a logit model. Solution: The logit results are. logit favwin spread Iteration 0: log likelihood = -302.74988 Iteration 1: log likelihood = -268.51377 Iteration 2: log likelihood = -264.1308 Iteration 3: log likelihood = -263.90218 Iteration 4: log likelihood = -263.90131 Iteration 5: log likelihood = -263.90131 Logit estimates Number of obs = 553 LR chi2(1) = 77.70 Prob > chi2 = 0.0000 Log likelihood = -263.90131 Pseudo R2 = 0.1283 ------- favwin Coef. Std. Err. z P> z -------------+----------------------------------------- spread.1632261.0225567 7.24 0.000 _cons -.071157.1732172-0.41 0.681 Page 12

------- When spread = 10 the predicted response probability from the estimated logit model is F (.0712 +.1632(10)) = e1.56 =.8265 1 + e1.56 This is somewhat above both the estimate for the LPM and the probit. Page 13