1 The Problem: Endogeneity There are two kinds of variables in our models: exogenous variables and endogenous variables. Endogenous Variables: These a



Similar documents
Lecture 15. Endogeneity & Instrumental Variable Estimation

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Solución del Examen Tipo: 1

Econometrics Simple Linear Regression

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Econometric analysis of the Belgian car market

2. Linear regression with multiple regressors

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

16 : Demand Forecasting

Weak instruments: An overview and new techniques

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

Example: Boats and Manatees

From the help desk: Bootstrapped standard errors

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Using instrumental variables techniques in economics and finance

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

As we explained in the textbook discussion of statistical estimation of demand

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*

Introduction to Regression and Data Analysis

Simple Regression Theory II 2010 Samuel L. Baker

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Financial Risk Management Exam Sample Questions/Answers

August 2012 EXAMINATIONS Solution Part I

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Rockefeller College University at Albany

Repeatability and Reproducibility

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

An Introduction to Path Analysis. nach 3

13. Poisson Regression Analysis

11. Analysis of Case-control Studies Logistic Regression

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Regression Analysis (Spring, 2000)

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Multiple Imputation for Missing Data: A Cautionary Tale

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

1 Review of Least Squares Solutions to Overdetermined Systems

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Regression Analysis: A Complete Example

Chapter Four. Data Analyses and Presentation of the Findings

Scatter Plot, Correlation, and Regression on the TI-83/84

Linear Programming Notes V Problem Transformations

Week 4: Standard Error and Confidence Intervals

Mgmt 469. Fixed Effects Models. Suppose you want to learn the effect of price on the demand for back massages. You

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Clustering in the Linear Model

SPSS Explore procedure

The correlation coefficient

Elements of statistics (MATH0487-1)

Row Echelon Form and Reduced Row Echelon Form

The Effect of Housing on Portfolio Choice. July 2009

Note 2 to Computer class: Standard mis-specification tests

Cost implications of no-fault automobile insurance. By: Joseph E. Johnson, George B. Flanigan, and Daniel T. Winkler

SPSS Guide: Regression Analysis

Time Series and Forecasting

Penalized regression: Introduction

Statistics 2014 Scoring Guidelines

Association Between Variables

Lecture Notes on MONEY, BANKING, AND FINANCIAL MARKETS. Peter N. Ireland Department of Economics Boston College.

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Basic Electronics Prof. Dr. Chitralekha Mahanta Department of Electronics and Communication Engineering Indian Institute of Technology, Guwahati

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Estimation of σ 2, the variance of ɛ

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger

Graphing Calculator Workshops

2.3. Finding polynomial functions. An Introduction:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Simple linear regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Multiple Regression: What Is It?

Introduction to Fixed Effects Methods

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Cointegration and error correction

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

5. Linear Regression

Chapter 4: Vector Autoregressive Models

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Testing The Quantity Theory of Money in Greece: A Note

Notes on Applied Linear Regression

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Simple Linear Regression Inference

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

5. Multiple regression

Solving Linear Equations in One Variable. Worked Examples

Transcription:

Notes on Simultaneous Equations and Two Stage Least Squares Estimates Copyright - Jonathan Nagler; April 19, 1999 1. Basic Description of 2SLS ffl The endogeneity problem, and the bias of OLS. ffl The mechanics of the solution. ffl The consistency property of 2sls 2. Presenting Results with 2SLS: the first stage. 3. Tests for endogeneity: knowing you need to use 2SLS. 4. Tests for exogeneity of instruments: making sure you have used 2SLS legitimately. 1

1 The Problem: Endogeneity There are two kinds of variables in our models: exogenous variables and endogenous variables. Endogenous Variables: These are variables determined within the system of equations which represent the true world. This means that they are functions of other variables present in the system. Up till now (in the single equation world) the only endogenous variable we have dealt with has always been the dependent variable. Exogenous Variables: These are variables determined outside the system. Up till now (in the single equation world) we have treated all of our independent variables as exogenous. 2

As a general rule, when a variable is endogenous, it will be correlated with the disturbance term, hence violating the GM assumptions and making our OLS estimates biased. This is easily seen in the following example of two equations where Y and X 1 are both endogenous. Y i = ff 10 + fi 11 X 1i + fi 12 X 2i + fi 13 X 3i + ::: + fi 1k X ki + u i (1) X 1i = ff 20 + fi 21 Y i + fi 22 Z 2i + fi 23 Z 3i + ::: + fi 2k Z ki + v i (2) Now substitute the first equation into the second: X 1i = ff 20 + fi 21 (ff 10 + fi 11 X 1i + fi 12 X 2i + fi 13 X 3i + ::: + fi 1k X ki + u i ) + fi 22 Z 2i + fi 23 Z 3i + ::: + fi 2k Z ki + v i We can see that X 1 is a linear function of u (among other things), and hence will be correlated with u. This violates the GM assumptions, and the OLS estimator ^fi 11 will be biased. 3

2 Just-Identified Equations If the set of equations is exactly identified, then we can solve for the reduced-form parameters, and then compute the structural parameters from the reduced form parameters. 4

3 Over-Identification: Use Two Stage Least Squares The goal is to find a proxy for X, that will not be correlated with u. This proxy is going to be called ^X. The first stage of 2SLS is to generate the proxy, the second stage is to simply substitute the proxy for X, and estimate the resulting equation using OLS. The trick to generating a proxy is to find a variable that belongs in the second equation (the one predicting X 1 ), but does not belong in the first equation (the one predicting Y ). In other words, we want to find a variable Z that determines X 1 in the world, but that does not influence Y. This is generally not an easy task. A slightly sloppy statement (actually all that is required is that in the limit these quantities approach 0 and `not 0', respectively) of the technical conditions on Z are as follows: corr(z; u) = 0 corr(z; x) 6= 0 5

So say equation (1) above is the true model [we are now ignoring equation (2) from above]: Y i = ff 10 + fi 11 X 1i + fi 12 X 2i + fi 13 X 3i + ::: + fi 1k X ki + u i And say we find some variable Z that influences X 1 but does not influence Y. Note that we only need to find one Z. Then we would estimate the following equation using OLS: X 1i = ff 30 + fi 31 Z i + fi 32 X 2i + fi 33 X 3i + ::: + fi 3k X ki + fl i (3) What we have done in this equation is to include all of the exogenous variables from the first equation on the RHS, and added Z. These estimates would allow us to generate a new set of values for the variable ^X 1 : And: ^X 1i = ^ff 30 + ^fi 31 Z i + ^fi 32 X 2i + ^fi 33 X 3i + ::: + ^fi 3k X ki (4) X 1 = ^X 1 + ^fl (5) 6

Now, ^X1 can be substituted for X 1 in equation (1): Y i = ff 10 + fi 11 ( ^X 1i + ^fl i ) + fi 12 X 2i + fi 13 X 3i + ::: + fi 1k X ki + u i Rewrite this: Y i = ff 10 + fi 11 ^X1i + fi 12 X 2i + fi 13 X 3i + ::: + fi 1k X ki + (u i + fi 11^fl i ) The new equation estimated using OLS. This will produce consistent estimates of all the parameters, including fi 11. Consistency is an assymptotic property, it requires large samples. The estimates will not be unbiased. The final thing to do however is to correct the standard errors that would be generated this way as they would be incorrect. This is simply a technical correction. 7

4 2SLS in Practice If the first stage produces poor predictors of X, then the 2SLS procedure will not perform very well in terms of the precision of the estimates generated. In other words, if ^X1 is a noisy predictor of X 1, then it is the same as estimating an equation using OLS when there is a large amount of measurement error. Since you want to observe the first-stage estimates, you probably want to run 2SLS manually as well as using your stat package's canned routine. If you use STATA, ivreg will compute two-state least squares estimates; and including the `, first' option will display the first-stage results. 8

5 Associated Tests 5.1 Testing for Endogeneity of X (Heckman) We can test to see if endogeneity is a problem as follows: 1. Find Z, compute our clean version of X: ^X. 2. Estimate the original model (equation 1) with both X and ^X on the right hand side. 3. If the coefficient of ^X is significant, then we have an endogeneity problem. Obviously it is not a great feature of this test that we can only apply the diagnostic test after we have a cure. 9

5.2 Testing for Exogeneity of Z (Hausman) 1. Estimate the model via 2SLS, and compute the residuals e. 2. Run a regression of these residuals on all exogenous variables (included and excluded). 3. Compute a test statistic of nr 2 ; using the R 2 from the regression of the residuals and n is the number of observations. 4. The nr 2 statistic will have a χ 2 distribution with degress of freedom equal to the number of excluded exogenous variables (the number of Zs) minus the number of endogenous variables explained by the instruments. 5. If nr 2 is too big you can reject the assumption that Z is exogenous. 10