Instrumental Variables & 2SLS

Similar documents
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Solución del Examen Tipo: 1

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Econometric analysis of the Belgian car market

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

2. Linear regression with multiple regressors

Lecture 15. Endogeneity & Instrumental Variable Estimation

Financial Risk Management Exam Sample Questions/Answers

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

SYSTEMS OF REGRESSION EQUATIONS

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Module 5: Multiple Regression Analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Regression Analysis (Spring, 2000)

Regression III: Advanced Methods

Using instrumental variables techniques in economics and finance

Multiple Regression: What Is It?

Panel Data: Linear Models

Correlational Research

Multiple Linear Regression in Data Mining

Correlated Random Effects Panel Data Models

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Nonlinear Regression Functions. SW Ch 8 1/54/

Introduction to Regression and Data Analysis

Empirical Methods in Applied Economics

Simple Regression Theory II 2010 Samuel L. Baker

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Example: Boats and Manatees

Multiple Linear Regression

Simple Linear Regression Inference

1. THE LINEAR MODEL WITH CLUSTER EFFECTS

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

False. Model 2 is not a special case of Model 1, because Model 2 includes X5, which is not part of Model 1. What she ought to do is estimate

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Employer-Provided Health Insurance and Labor Supply of Married Women

From the help desk: Bootstrapped standard errors

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

Econometrics Simple Linear Regression

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, Professor Patricia A.

Linear Models in STATA and ANOVA

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Part 2: Analysis of Relationship Between Two Variables

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY

Chapter 6: Multivariate Cointegration Analysis

The Effect of Housing on Portfolio Choice. July 2009

August 2012 EXAMINATIONS Solution Part I

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Financial Risk Management Exam Sample Questions

16 : Demand Forecasting

5. Linear Regression

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Statistical tests for SPSS

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Simple linear regression

Chapter 2. Dynamic panel data models

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Rockefeller College University at Albany

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Getting Correct Results from PROC REG

Testing for Granger causality between stock prices and economic growth

Univariate Regression

Some useful concepts in univariate time series analysis

INTRODUCTION TO MULTIPLE CORRELATION

Performing Unit Root Tests in EViews. Unit Root Testing

Influences of Organizational Structure and Diversification on Medical Malpractice Insurer Performance

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Specification: Choosing the Independent Variables

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

Effects of CEO turnover on company performance

Clustering in the Linear Model

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Pearson s Correlation

Accounting for Time-Varying Unobserved Ability Heterogeneity within Education Production Functions

Introduction to Quantitative Methods

Premaster Statistics Tutorial 4 Full solutions

Lecture 3: Differences-in-Differences

Hypothesis testing - Steps

Panel Data Analysis in Stata

Chapter 7: Dummy variable regression

Using R for Linear Regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Elements of statistics (MATH0487-1)

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Transcription:

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1

Why Use Instrumental Variables? Instrumental Variables (IV) estimation is used when your model has endogenous x s i.e. whenever Cov(x,u) 0 Thus, IV can be used to address the problem of omitted variable bias Also, IV can be used to solve the classic errors-invariables problem Economics 20 - Prof. Schuetze 2

What Is an Instrumental Variable? In order for a variable, z, to serve as a valid instrument for x, the following must be true 1. The instrument must be exogenous - i.e. Cov(z,u) = 0 -more specifically z should have no partial effect on y and should be uncorrelated with u 2. The instrument must be correlated with the endogenous variable x - i.e Cov(z,x) 0 Economics 20 - Prof. Schuetze 3

Difference between IV and Proxy? With IV we will leave the unobserved variable in the error term but use an estimation method that recognizes the presence of the omitted variable With a proxy we were trying to remove the unobserved variable from the error term e.g. IQ IQ would make a poor instrument as it would be correlated with the error in our model (ability in u) Need something correlated with education but uncorrelated with ability (parents education?) Economics 20 - Prof. Schuetze 4

More on Valid Instruments We can t test if Cov(z,u) = 0 as this is a population assumption Instead, we have to rely on common sense and economic theory to decide if it makes sense However, we can test if Cov(z,x) 0 using a random sample Simply estimate x = π 0 + π 1 z + v, and test H 0 : π 1 = 0 Refer to this regression as the first-stage regression Economics 20 - Prof. Schuetze 5

IV Estimation in the Simple Regression Case We can show that if we have a valid instrument we can get consistent estimates of the parameters For y = β 0 + β 1 x + u Cov(z,y) = β 1 Cov(z,x) + Cov(z,u), so Thus, β 1 = Cov(z,y) / Cov(z,x), since Cov(z,u)=0 and Cov(z,x) 0 Then the IV estimator for β 1 is βˆ 1 = ( z z )( y y ) i i ( z z )( x x ) i i Same as OLS when z = x Economics 20 - Prof. Schuetze 6

Inference with IV Estimation The IV estimator also has an approximate normal distribution in large samples To get estimates of the standard errors we need a slightly different homoskedasticity assumption: E(u 2 z) = σ 2 = Var(u) (conditioning on z here) If this is true, we can show that the asymptotic variance of β1-hat is: Var ˆ 2 ( β ) 1 = 2 2 n σ σ x ρ x, z σ x2 is the pop variance of x σ 2 is the pop variance of u ρ 2 xz is the square of the pop correlation between x and z Economics 20 - Prof. Schuetze 7

Inference with IV Estimation Each of the elements in the population variance can be estimated from a random sample The estimated variance is then: Var ˆ ˆ σ 2 ( β ) 1 = 2 SST x R x, z Where σ 2 = SSR from IV divided by the df, SST x is the sample variance in x and the R 2 is from the first stage regression (x on z) The standard error is just the square root of this Economics 20 - Prof. Schuetze 8

IV versus OLS estimation Standard error in IV case differs from OLS only in the R 2 from regressing x on z Since R 2 < 1, IV standard errors are larger However, IV is consistent, while OLS is inconsistent, when Cov(x,u) 0 Notice that the stronger the correlation between z and x, the smaller the IV standard errors Economics 20 - Prof. Schuetze 9

The Effect of Poor Instruments What if our assumption that Cov(z,u) = 0 is false? The IV estimator will be inconsistent also We can compare the asymptotic bias in OLS to that in IV in this case: Corr( z, u) σ u IV : plimβˆ 1 = β 1 + Corr( z, x) σ OLS: plim ~ β 1 = β 1 + Corr( x, u) x σ σ Even if Corr(z,u) is small the inconsistency can be large if Corr(z,x) is also very small u x Economics 20 - Prof. Schuetze 10

Effect of Poor Instruments (cont) So, it is not necessarily better to us IV instead of OLS even if z and u are not highly correlated Instead, prefer IV only if Corr(z,u)/Corr(z,x) < Corr(x,u) Also notice that the inconsistency gets really large if z and x are only loosely correlated Best to test for correlation in the first stage regression Economics 20 - Prof. Schuetze 11

A Note on R 2 in IV R 2 after IV estimation can be negative Recall that R 2 = 1 - SSR/SST where SSR is the residual sum of IV residuals SSR in this case can be larger than SST making the R 2 negative Thus, R 2 isn t very useful here and can t be used for F-tests Not important as we would prefer consistent estimates of the coefficients Economics 20 - Prof. Schuetze 12

IV Estimation in the Multiple Regression Case IV estimation can be extended to the multiple regression case Estimating: y 1 = β 0 + β 1 y 2 + β 2 z 1 + u 1 Where y 2 is endogenous and z 1 is exogenous Call this the structural model If we estimate the structural model the coefficients will be biased and inconsistent Thus, we need an instrument for y 2 Can we use z 1 if it is correlated with y 2 (we know it isn t correlated with u 1 )? Economics 20 - Prof. Schuetze 13

Multiple Regression IV (cont) No, because it appears in the structural model Instead, we need an instrument, z 2, that: 1. Doesn t belong in the structural model 2. Is uncorrelated with u 1 3. Is correlated with y 2 in a particular way - Now because of z 1 we need a partial correlation - i.e. for the reduced form equation y 2 = π 0 + π 1 z 1 + π 2 z 2 + v 2, π 2 0 If we have such an instrument and u 1 is uncorrelated with z 1 the model is identified Economics 20 - Prof. Schuetze 14

Two Stage Least Squares (2SLS) It is possible to have multiple instruments Consider the structural model, with 1 endogenous, y 2, and 1 exogenous, z 1, RHS variable Suppose that we have two valid instruments, z 2 and z 3 Since z 1, z 2 and z 3 are uncorrelated with u 1, so is any linear combination of these Thus, any linear combination is also a valid instrument Economics 20 - Prof. Schuetze 15

Best Instrument The best instrument is the one that is most highly correlated with y 2 This turns out to be a linear combination of the exogenous variables The reduce form equation is: y 2 = π 0 + π 1 z 1 + π 2 z 2 + π 3 z 3 + v 2 or y 2 = y 2 * + v 2 Can think of y 2 * as the part of y 2 that is uncorrelated with u 1 and v 2 as the part that might be correlated with u 1 Thus the best IV for y 2 is y 2 * Economics 20 - Prof. Schuetze 16

More on 2SLS We can estimate y 2 * by regressing y 2 on z 1, z 2 and z 3 the first stage regression If then substitute 2 for y 2 in the structural model, get same coefficient as IV While the coefficients are the same, the standard errors from doing 2SLS by hand are incorrect Also recall that since the R2 can be negative F- tests will be invalid Stata will calculate the correct standard error and F-tests Economics 20 - Prof. Schuetze 17

More on 2SLS (cont) We can extend this method to include multiple endogenous variables However, we need to be sure that we have at least as many excluded exogenous variables (instruments) as there are endogenous variables If not, the model is not identified Economics 20 - Prof. Schuetze 18

Addressing Errors-in-Variables with IV Estimation Recall the classical errors-in-variables problem where we observe x 1 instead of x 1 * Where x 1 = x 1 * + e 1, we showed that when x 1 and e 1 are correlated the OLS estimates are biased We maintain the assumption that u is uncorrelated with x 1 *, x 1 and x 2 and that and e 1 is uncorrelated with x 1 * and x 2 If we can find an instrument, z, such that Corr(z,u) = 0 and Corr(z,x 1 ) 0, then we can use IV to remove the attenuation bias Economics 20 - Prof. Schuetze 19

Example of Instrument Suppose that we have a second measure of x 1 *(z 1 ) Examples: both husband and wife report earnings both employer and employee report earnings z 1 will also measure x 1 * with error However, as long as the measurement error in z 1 is uncorrelated with the measurement error in x 1, z 1 is a valid instrument Economics 20 - Prof. Schuetze 20

Testing for Endogeneity Since OLS is preferred to IV if we do not have an endogeneity problem, then we d like to be able to test for endogeneity Suppose we have the following structural model: y 1 = β 0 + β 1 y 2 + β 2 z 1 + β 3 z 2 + u We suspect that y 2 is endogenous and we have instruments for y 2 (z 3, z 4 ) How do we determine if y 2 is endogenous? Economics 20 - Prof. Schuetze 21

Testing for Endogeneity (cont) 1. Hausman Test If all variables are exogenous both OLS and 2SLS are consistent If there are statistically significant differences in the coefficients we conclude that y 2 is endogenous 2. Regression Test In the first stage equation: y 2 = π 0 + π 1 z 1 + π 2 z 2 + π 3 z 3 + π 3 z 3 + v 2 Each of the z s are uncorrelated with u 1 Economics 20 - Prof. Schuetze 22

Testing for Endogeneity (cont) Thus y 2 will only be correlated with u 1 if v 2 is correlated with u 1 The test is based on this observation So, save the residuals from the first stage Include the residual in the structural equation (which of course has y 2 in it) If the coefficient on the residual is statistically different from zero, reject the null of exogeneity If multiple endogenous variables, jointly test the residuals from each first stage Economics 20 - Prof. Schuetze 23

Testing Overidentifying Restrictions How can we determine if we have a good instrument -correlated with y 2 uncorrelated with u? Easy to test if z is correlated with y 2 If there is just one instrument for our endogenous variable, we can t test whether the instrument is uncorrelated with the error (u is unobserved) If we have multiple instruments, it is possible to test the overidentifying restrictions i.e. to see if some of the instruments are correlated with the error Economics 20 - Prof. Schuetze 24

The OverID Test Using our previous example, suppose we have two instruments for y 2 (z 3, z 4 ) We could estimate our structural model using only z 3 as an instrument, assuming it is uncorrelated with the error, and get the residuals: uˆ 1 = y 1 ˆ β 0 ˆ β 1 y Since z 4 hasn t been used we can check whether it is correlated with u 1 -hat If they are correlated z 4 isn t a good instrument 2 ˆ β 2 Economics 20 - Prof. Schuetze 25 z 1 ˆ β 3 z 2

The OverID Test We could do the same for z 3, as long as we can assume that z 4 is uncorrelated with u 1 A procedure that allows us to do this is: 1. Estimate the structural model using IV and obtain the residuals 2. Regress the residuals on all the exogenous variables and obtain the R 2 to form nr 2 3. Under the null that all instruments are uncorrelated with the error, LM ~ χ q2 where q is the number of extra instruments Economics 20 - Prof. Schuetze 26

Testing for Heteroskedasticity When using 2SLS, we need a slight adjustment to the Breusch-Pagan test Get the residuals from the IV estimation Regress these residuals squared on all of the exogenous variables in the model (including the instruments) Test for the joint significance Note: there are also robust standard errors in the IV setting Economics 20 - Prof. Schuetze 27

Testing for Serial Correlation Also need a slight adjustment to the test for serial correlation when using 2SLS Re-estimate the structural model by 2SLS, including the lagged residuals, and using the same instruments as originally Test if the coefficient on the lagged residual (ρ) is statistically different than zero Can also correct for serial correlation by doing 2SLS on a quasi-differenced model, using quasidifferenced instruments Economics 20 - Prof. Schuetze 28