IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Size: px

Start display at page:

Download "IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results"

Stewart Fletcher
9 years ago
Views:

1 IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results

2 How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the year, 2008??? Best article of the year, Best article of the year, 200

3 Session 3 Topics n Multiple regression analysis What does it mean? Why is it important? How is it done and how are results interpreted? What are the hazards?

4 Multiple Regression Analysis n What does it mean? Multivariate analysis/statistics Ceteris paribus All else equal Controlling for

5 Multiple Regression Analysis n Why does it matter? y α + x + u n If E u x = E u = implying Corr( u, x ) = 0 What if y = α + βx + β2x2 + ε If = β ( ) ( ) 0 Results are biased, then = β 2x2 + ε Corr ( x x ) 0 Corr( u, x ) 0, 2 ( x, x ) 0 E u 2 = (and other conditions), we can estimate w/ multiple regressors u

6 Multiple Regression Analysis n Consider maize yield (mzyield) and basal fertilizer (basaprate), both kg/ha mzyield α + β basaprate + = u. reg mzyield basaprate Source SS df MS Number of obs = 8648 F(, 8646) = Model 2.590e e+09 Prob > F = Residual.2229e R-squared = 0.50 Adj R-squared = Total.4388e Root MSE = 89.3 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate _cons

0000 Residual.2229e+0 8646 44446.5 R-squared = 0.50 Adj R-squared = 0.500 Total.4388e+0 8647 663962.69 Root MSE = 89.3 mzyield Coef.

7 Multiple Regression Analysis n Top dressing (topaprate) determines yield and is correlated with basaprate, both kg/ha = α + β basaprate + β topaprate + ε mzyield 2. reg mzyield basaprate topaprate Source SS df MS Number of obs = 8647 F( 2, 8644) = Model 2.348e e+09 Prob > F = Residual.2046e R-squared = Adj R-squared = Total.4387e Root MSE = 80.5 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate topaprate _cons

0000 Residual.2046e+0 8644 393535.34 R-squared = 0.628 Adj R-squared = 0.626 Total.4387e+0 8646 66406.58 Root MSE = 80.5 mzyield Coef. Std. Err.

8 Multiple Regression Analysis y = α + βx + β2x βk xk + u n n α β is the intercept are slope parameters (usually)

9 y β slope α intercept 2 3 x 8

10 Multiple Regression Analysis y = α + βx + β2x βk xk + u n n α β is the intercept are slope parameters (usually) n u is the unobserved error or disturbance term n y is the dependant, explained, response or predicted variable n x... x k are the independent, explanatory, control or predictor variables, or regressors

the unobserved error or disturbance term n y is the dependant, explained,

11 How is it done? n OLS finds the β parameters that minimize: n ( yi α β xi β2xi2... βk xik ) i= n Minimize the noise n Squared, so residuals don t off set n Gives us βˆ and predicted values 2 ŷ

12 Ceteris Paribus Interpretation u x x x y k k = β β β α n is the partial effect or ceteris paribus n Change x only: n Change x 2 only: n Share of total change attributable to x : β ˆ ˆ x y Δ = Δ β 2 2 ˆ ˆ x y Δ = Δ β y x ˆ ˆ Δ Δ β 2 2 ˆ ˆ ˆ x x y Δ + Δ = Δ β β

Change x 2 only: n Share of total change attributable to x : β ˆ ˆ x

13 Ceteris Paribus Interpretation n Now, how do we interpret the coefficient estimate for basaprate? mzyield α + β basaprate + β topaprate + u = 2. reg mzyield basaprate topaprate Source SS df MS Number of obs = 8647 F( 2, 8644) = Model 2.348e e+09 Prob > F = Residual.2046e R-squared = Adj R-squared = Total.4387e Root MSE = 80.5 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate topaprate _cons

2046e+0 8644 393535.34 R-squared = 0.628 Adj R-squared = 0.626 Total.4387e+0 8646 66406.58 Root MSE = 80.5 mzyield Coef. Std. Err. t P> t [95% Conf.

14 Ceteris Paribus Interpretation n According to these results, a one unit change in x will result in a βˆ unit change in y, all else equal. n The ceteris paribus effect of a one unit change in x is a βˆ unit change in y. n Holding x 2 constant, a one unit change in x results in a βˆ unit change in y.

n The ceteris paribus effect of a one unit change in x is a βˆ unit change

15 Key Assumptions n Linear in parameters n Random sample n Zero conditional mean n No perfect collinearity (variation in data) n Homoskedastic errors

16 Key Assumptions n Linear in parameters n Random sample n Zero conditional mean n No perfect collinearity (variation in data) n Homoskedastic errors

17 Perfect Collinearity n Variable is a linear function of one or more others. n No variation in one variable (collinear w/ intercept)

18 Can t estimate slope parameter if no variation in x Source: Wooldridge (2002) 7

19 Perfect Collinearity n Variable is a linear function of one or more others. n No variation in one variable (collinear w/ intercept) n Perfect correlation between 2 binary variables

20 Other hazards n Multi-collinearity n Including irrelevant variables n Omitting relevant variables

21 Multi-Collinearity n Highly correlated variables n Variable is a nonlinear function of others n What s the problem? n Efficiency losses n Schmidt thumb rule

22 Including Irrelevant Variables y α + β x + β x + x + = 2 2 β3 3 u n Suppose x 3 is has no effect on y, but key assumptions are satisfied (overspecified) n OLS is an unbiased estimator of β 3, even if is zero n Estimates of β and β 2 will be less efficient β 3

23 Omitting Relevant Variables y α + β x + β x + = 2 2 u n Suppose we omit x 2 (underspecifying) n OLS is generally biased

24 Omitting Relevant Variables y α + β x + β x + = 2 2 n Estimate n And let x ~ ~ ~ y = α + β x 0 ~ ~ = δ + δ x u 2 n It can be shown that: E ~ ~ ( β ) = β + β 2δ Omitted Variable Bias

25 Multiple Regression Analysis Corr(x,x 2 )>0 Corr(x,x 2 )<0 β 2 > 0 Positive bias Negative bias β 2 < 0 Negative bias Positive bias Source: Wooldridge, 2002, page 92

26 Omitting Relevant Variables n More generally, all OLS estimates will be biased, even if just one explanatory variable is correlated with the omitted variables n Direction of bias is less clear

27 Multiple Regression Analysis n Goodness of fit R 2 is the share of explained variance R 2 never decreases when we add variables Usually, it will increase regardless of relevance n Adjusted R 2 accounts for this

28 Next time: Interpreting results n Binary regressors n Other categorical regressors n Categorical regressors as a series of binary regressors n Quadratic terms n Other interactions n Average Partial Effects

29 Sessions materials developed by Bill Burke with input from Nicole Mason. January 202.

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages