# Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Save this PDF as:

Size: px
Start display at page:

Download "Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares"

## Transcription

1 Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit into the framework of y-on-x regression, in which we can assume that the y variable is determined by (but does not jointly determine) X. Indeed, the simplest analytical concepts we teach in principles of economics a demand curve in micro, and the Keynesian consumption function in macro are relations of this sort, where at least one of the explanatory variables is endogenous, or jointly determined with the dependent variable. From a mathematical standpoint, the difficulties that this endogeneity cause for econometric analysis are

2 identical to those which we have already considered, in two contexts: that of omitted variables, and that of errors-in-variables, or measurement error in the X variables. In each of these three cases, OLS is not capable of delivering consistent parameter estimates. We now turn to a general solution to the problem of endogenous regressors, which as we will see can also be profitably applied in other contexts, in which the omitted variable (or poorly measured variable) can be taken into account. The general concept is that of the instrumental variables estimator; a popular form of that estimator, often employed in the context of endogeneity, is known as two-stage least squares (2SLS). To motivate the problem, let us consider the omitted-variable problem: for instance, a wage equation, which would be correctly specified as: log(wage) = β 0 + β 1 educ + β 2 abil + e (1)

3 This equation cannot be estimated, because ability (abil) is not observed. If we had a proxy variable available, we could substitute it for abil; the quality of that equation would then depend on the degree to which it was a good proxy. If we merely ignore abil, it becomes part of the error term in the specification: log(wage) = β 0 + β 1 educ + u (2) If abil and educ are correlated, OLS will yield biased and inconsistent estimates. To consistently estimate this equation, we must find an instrumental variable: a new variable that satisfies certain properties. Imagine that variable z is uncorrelated with u, but is correlated with educ. A variable that meets those two conditions is an instrumental variable for educ. We cannot directly test the prior assumption, since we cannot observe u; but we can readily test the latter assumption, and should do so,

4 by merely regressing the included explanatory variable on the instrument: educ = π 0 + π 1 z + υ (3) In this regression, we should easily reject H 0 : π 1 = 0. It should be clear that there is no unique choice of an instrument in this situation; many potential variables could meet these two conditions, of being uncorrelated with the unobservable factors influencing the wage (including abil) and correlated with educ. Note that in this context we are not searching for a proxy variable for abil; if we had a good proxy for abil, it would not make a satisfactory instrumental variable, since correlation with abil implies correlation with the error process u. What might serve in this context? Perhaps something like the mother s level of education, or the number of siblings, would make a sensible instrument. If we determine that we have a reasonable instrument, how may it be used?

5 Return to the misspecified equation (2), and write it in general terms of y and x : y = β 0 + β 1 x + u (4) If we now take the covariance of each term in the equation with our instrument z, we find: Cov(y, z) = β 1 Cov(x, z) + Cov(u, z) (5) We have made use of the fact that the covariance with a constant is zero. Since by assumption the instrument is uncorrelated with the error process u, the last term has expectation zero, and we may solve (5) for our estimate of β 1 : b 1 = Cov(y, z) Cov(x, z) = (yi ȳ) (z i z) (xi x) (z i z) (6) Note that this estimator has an interesting special case where x = z : that is, where an explanatory variable may serve as its own instrument, which would be appropriate if Cov(x, u) =

6 0. In that case, this estimator may be seen to be the OLS estimator of β 1. Thus, we may consider OLS as a special case of IV, usable when the assumption of exogeneity of the x variable(s) may be made. We may also note that the IV estimator is consistent, as long as the two key assumptions about the instrument s properties are satisfied. The IV estimator is not an unbiased estimator, though, and in small samples its bias may be substantial. Inference with the IV estimator To carry out inference compute interval estimates and hypothesis tests we assume that the error process is homoskedastic: in this case, conditional on the instrumental variable z, not the included explanatory variable x. With this additional assumption, we may derive the asymptotic variance of the IV estimator as: V ar(b 1 ) = σ2 SST x ρ 2 xz (7)

7 where n is the sample size, SST x is the total sum of squares of the explanatory variable, and ρ 2 xz is the R 2 (or squared correlation) in a regression of x on z : that is, equation (3). This quantity can be consistently estimated; σ 2 from the regression residuals, just as with OLS. Notice that as the correlation between the explanatory variable x and the instrument z increases, ceteris paribus, the sampling variance of b 1 decreases. Thus, an instrumental variables estimate generated from a better instrument will be more precise (conditional, of course, on the instrument having zero correlation with u). Note as well that this estimated variance must exceed that of the OLS estimator of b 1, since 0 ρ 2 xz 1. In the case where an explanatory variable may serve as its own instrument, the squared correlation is unity. The IV estimator will always have a larger asymptotic variance than will the OLS estimator, but that merely reflects the introduction of an additional source of uncertainty (in the form of

8 the instrument, imperfectly correlated with the explanatory variable). What will happen if we use the instrumental variables with a poor or weak instrument? A weak correlation between x and z will bring a sizable bias in the estimator. If there is any correlation between z and u, a weak correlation between x and z will render IV estimates inconsistent. Although we cannot observe the correlation between z and u, we can empirically evaluate the correlation between the explanatory variable and its instrument, and should always do so. It should also be noted that an R 2 measure in the context of the IV estimator is not the percentage of variation explained measure that we are familiar with in OLS terms. In the presence of correlation between x and u, we can no longer decompose the variation in y into two

9 independent components, SSE and SSR, and R 2 has no natural interpretation. In the OLS context, a joint hypothesis test can be written in terms of R 2 measures; that cannot be done in the IV context. Just as the asymptotic variance of an IV estimator exceeds that of OLS, the R 2 measure from IV will never beat that which may be calculated from OLS. If we wanted to maximize R 2, we would just use OLS; but when OLS is biased and inconsistent, we seek an estimation technique that will focus on providing consistent estimates of the regression parameters, and not mechanically find the least squares solution in terms of inconsistent parameter estimates. IV estimates in the multiple regression context The instrumental variables technique illustrated above can readily be extended to the case of

10 multiple regression. To introduce some notation, consider a structural equation: y 1 = β 0 + β 1 y 2 + β 2 z 1 + u 1 (8) where we have suppressed the observation subscripts. The y variables are endogenous; the z variable is exogenous. The endogenous nature of y 2 implies that if this equation is estimated by OLS, the point estimates will be biased and inconsistent, since the error term will be correlated with y 2. We need an instrument for y 2 : a variable that is correlated with y 2, but not correlated with u. Let us write the endogenous explanatory variable in terms of the exogenous variables, including the instrument z 2 : y 2 = π 0 + π 1 z 1 + π 2 z 2 + v (9) The key identification condition is that π 2 0; that is, after partialling out z 1, y 2 and z 2 are

11 still meaningfully correlated. This can readily be tested by estimating the auxiliary regression (9). We cannot test the other crucial assumption: that in this context, cov(z 2, v) = 0. Given the satisfaction of these assumptions, we may then derive the instrumental variables estimator of (8) by writing down the normal equations for the least squares problem, and solving them for the point estimates. In this context, z 1 serves as an instrument for itself. We can extend this logic to include any number of additional exogenous variables in the equation; the condition that the analogue to (9) must have π 2 0 always applies. Likewise, we could imagine an equation with additional endogenous variables; for each additional endogenous variable on the right hand side, we would have to find another appropriate instrument, which would have to meet the two conditions specified above.

12 Two stage least squares (2SLS) What if we have a single endogenous explanatory variable, as in equation (8), but have more than one potential instrument? There might be several variables available, each of which would have a significant coefficient in an equation like (9), and could be considered uncorrelated with u. Depending on which of the potential instruments we employ, we will derive different IV estimates, with differing degrees of precision. This is not a very attractive possibility, since it suggests that depending on how we implement the IV estimator, we might reach different qualitatitive conclusions about the structural model. The technique of two-stage least squares (2SLS) has been developed to deal with this problem. How might we combine several instruments to produce the single instrument needed to implement IV for equation (8)? Naturally, by running a regression in this case, an auxiliary regression of the form of equation (9), with all of

13 the available instruments included as explanatory variables. The predicted values of that regression, ŷ 2, will serve as the instrument for y 2, and this auxiliary regression is the first stage of 2SLS. In the second stage, we use the IV estimator, making use of the generated instrument ŷ 2. The IV estimator we developed above can be shown, algebraically, to be a 2SLS estimator; but although the IV estimator becomes non-unique in the presence of multiple instruments, the 2SLS estimation technique will always yield a unique set of parameter values for a given instrument list. Although from a pedagogical standpoint we speak of the two stages, we should not actually perform 2SLS by hand. Why? Because the second stage will yield the wrong residuals (being computed from the instruments rather than the original variables), which implies that all statistics computed from those residuals will

14 be incorrect (the estimate s 2, the estimated standard errors of the parameters, etc.) We should make use of a computer program that has a command to perform 2SLS (or, as some programs term it, instrumental variables). In Stata, you use the ivregress command to perform either IV or 2SLS estimation. The syntax of ivregress is: ivregress 2sls depvar [varlist1] (varlist2=varlist iv) where depvar is the dependent variable; varlist1, which may not be present, is the list of included exogenous variables (such as z 1 in equation (8); varlist2 contains the included endogenous variables (such as y 2 in equation (8); and varlist iv contains the list of instruments that are not included in the equation, but will be used to form the instrumental variables estimator. If we wanted to estimate equation

15 (8) with Stata, we would give the command ivreg y1 z1 (y2 = z2). If we had additional exogenous variables in the equation, they would follow z1. If we had additional instruments (and were thus performing 2SLS), we would list them after z2. The 2SLS estimator may be applied to a much more complex model, in which there are multiple endogenous explanatory variables (which would be listed after y2 in the command), as well as any number of instruments and included exogenous variables. The constraint that must always be satisfied is related to the parenthesized lists: the order condition for identification. Intuitively, it states that for each included endogenous variable (e.g. y2), we must have at least one instrument that is, one exogenous variable that does not itself appear in the equation, or satisfies an exclusion restriction. If there are three included endogenous

16 variables, then we must have no fewer than three instruments after the equals sign, or the equation will not be identified. That is, it will not be possible to solve for a unique solution in terms of the instrumental variables estimator. In the case (such as the example above) where the number of included endogenous variables exactly equals the number of excluded exogenous variables, we satisfy the order condition with equality, and the standard IV estimator will yield a solution. Where we have more instruments than needed, we satisfy the order condition with inequality, and the 2SLS form of the estimator must be used to derive unique estimates, since we have more equations than unknowns: the equation is overidentified. If we have fewer instruments than needed, we fail the order condition, since there are more unknowns than equations. No econometric technique can solve this problem of underidentification. There are additional conditions for identification the order condition is

17 necessary, but not sufficient as it must also be the case that each instrument has a nonzero partial correlation with the dependent variable. This would fail, for instance, if one of our candidate instruments was actually a linear combination of the included exogenous variables. IV and errors-in-variables The instrumental variables estimator can also be used fruitfully to deal with the errors-invariables model discussed earlier not surprisingly, since the econometric difficulties caused by errors-in-variables are mathematically the same problem as that of an endogenous explanatory variable. To deal with errors-in-variables, we need an instrument for the mismeasured x variable that satisfies the usual assumptions: being well correlated with x, but not correlated with the error process. If we could find a

18 second measurement of x even one also subject to measurement error we could use it as an instrument, since it would presumably be well correlated with x itself, but if generated by an independent measurement process, uncorrelated with the original x s measurement error. Thus, we might conduct a household survey which inquires about disposable income, consumption, and saving. The respondents answers about their saving last year might well be mismeasured, since it is much harder to track saving than, say, earned income. The same could be said for their estimates of how much they spent on various categories of consumption. But using income and consumption data, we could derive a second (mismeasured) estimate of saving, and use it as an instrument to mitigate the problems of measurement error in the direct estimate. IV may also be used to solve proxy problems; imagine that we are regressing log(wage) on

19 education and experience, using a theoretical model that suggests that ability should appear as a regressor. Since we do not have a measure of ability, we use a test score as a proxy variable. That may introduce a problem, though, since the measurement error in the relation of test score to ability will cause the test score to be correlated with the error term. This might be dealt with if we had a second test score measure on a different aptitude test which could then be used as an instrument. The two test scores are likely to be correlated, and the measurement error in the first (the degree that it fails to measure ability) should not be correlated with the second score. Tests for endogeneity and overidentifying restrictions Since the use of IV will necessarily inflate the variances of the estimators, and weaken our

20 ability to make inferences from our estimates, we might be concerned about the need to apply IV (or 2SLS) in a particular equation. One form of a test for endogeneity can be readily performed in this context. Imagine that we have the equation: y 1 = β 0 + β 1 y 2 + β 2 z 1 + β 3 z 2 + u 1 (10) where y 2 is the single endogenous explanatory variable, and the z s are included exogenous variables. Imagine that the equation is overidentified for IV: that is, we have at least two instruments (in this case, z 3 and z 4 ) which could be used to estimate (10) via 2SLS. If we performed 2SLS, we would be estimating the following reduced form equation in the first stage : y 2 = π 0 + π 1 z 1 + π 2 z 2 + π 3 z 3 + π 4 z 4 + v (11) which would allow us to compute OLS residuals, ˆv. Those residuals will be that part of y 2

21 not correlated with the z s. If there is a problem of endogeneity of y 2 in equation (10), it will occur because cov(v, u 1 ) 0. We cannot observe v, but we can calculate a consistent estimate of v as ˆv. Including ˆv as an additional regressor in the OLS model y 1 = β 0 + β 1 y 2 + β 2 z 1 + β 3 z 2 + δˆv + ω (12) and testing for the significance of δ will give us the answer. If cov(v, u 1 ) = 0, our estimate of δ should not be significantly different from zero. If that is the case, then there is no evidence that y 2 is endogenous in the original equation, and OLS may be applied. If we reject the hypothesis that δ = 0, we should not rely on OLS, but should rather use IV (or 2SLS). This test may also be generalized for the presence of multiple included endogenous variables in (10); the relevant test is then an F test, jointly testing that a set of δ coefficients are all zero. This test is available within Stata as the estat endog command following ivregress.

22 Although we can never directly test the maintained hypothesis that the instruments are uncorrelated with the error process u, we can derive indirect evidence on the suitability of the instruments if we have an excess of instruments: that is, if the equation is overidentified, so that we are using 2SLS. The ivregress residuals may be regressed on all exogenous variables (included exogenous variables plus instruments). Under the null hypothesis that all IV s are uncorrelated with u, a Lagrange multiplier statistic of the nr 2 form will not exceed the critical point on a χ 2 (r) distribution, where r is the number of overidentifying restrictions (i.e. the number of excess instruments). If we reject this hypothesis, then we cast doubt on the suitability of the instruments; at least some of them do not appear to be satisfying the condition of orthogonality with the error process. This test is available within Stata as the estat overid command following ivregress.

23 Applying 2SLS in a time series context When there are concerns of included endogenous variables in a model fit to time series data, we have a natural source of instruments in terms of predetermined variables. For instance, if y 2t is an explanatory variable, its own lagged values, y 2t 1 or y 2t 2 might be used as instruments: they are likely to be correlated with y 2t, and they will not be correlated with the error term at time t, since they were generated at an earlier point in time. The one caveat that must be raised in this context relates to autocorrelated errors: if the errors are themselves autocorrelated, then the presumed exogeneity of predetermined variables will be in doubt. Tests for autocorrelated errors should be conducted; in the presence of autocorrelation, more distant lags might be used to mitigate this concern.

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Instrumental Variables (IV) Instrumental Variables (IV) is a method of estimation that is widely used

Instrumental Variables (IV) Instrumental Variables (IV) is a method of estimation that is widely used in many economic applications when correlation between the explanatory variables and the error term

### Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the

Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the explanatory variables is jointly determined with the dependent

### Econometrics. Week 9. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics Week 9 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21 Recommended Reading For the today Simultaneous Equations Models Chapter 16 (pp.

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### Wooldridge, Introductory Econometrics, 4th ed. Multiple regression analysis:

Wooldridge, Introductory Econometrics, 4th ed. Chapter 4: Inference Multiple regression analysis: We have discussed the conditions under which OLS estimators are unbiased, and derived the variances of

### 1 OLS under Measurement Error

ECON 370: Measurement Error 1 Violations of Gauss Markov Assumptions: Measurement Error Econometric Methods, ECON 370 1 OLS under Measurement Error We have found out that another source of endogeneity

### Instrumental variables and panel data methods in economics and finance

Instrumental variables and panel data methods in economics and finance Christopher F Baum Boston College and DIW Berlin February 2009 Christopher F Baum (Boston College) IVs and Panel Data Feb 2009 1 /

### 1 The Problem: Endogeneity There are two kinds of variables in our models: exogenous variables and endogenous variables. Endogenous Variables: These a

Notes on Simultaneous Equations and Two Stage Least Squares Estimates Copyright - Jonathan Nagler; April 19, 1999 1. Basic Description of 2SLS ffl The endogeneity problem, and the bias of OLS. ffl The

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### 1 Logit & Probit Models for Binary Response

ECON 370: Limited Dependent Variable 1 Limited Dependent Variable Econometric Methods, ECON 370 We had previously discussed the possibility of running regressions even when the dependent variable is dichotomous

### Simultaneous Equations Models. Sanjaya DeSilva

Simultaneous Equations Models Sanjaya DeSilva 1 Reduced Form and Structural Models We will begin with definitions; 1. Exogenous variables are variables that are determined outside of the model. For the

### Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

### Using instrumental variables techniques in economics and finance

Using instrumental variables techniques in economics and finance Christopher F Baum 1 Boston College and DIW Berlin German Stata Users Group Meeting, Berlin, June 2008 1 Thanks to Mark Schaffer for a number

### Econ 371 Exam #4 - Practice

Econ 37 Exam #4 - Practice Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement. ) The following will not cause correlation between

### Simultaneous Equations

Simultaneous Equations 1 Motivation and Examples Now we relax the assumption that EX u 0 his wil require new techniques: 1 Instrumental variables 2 2- and 3-stage least squares 3 Limited LIML and full

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Lecture 16. Endogeneity & Instrumental Variable Estimation (continued)

Lecture 16. Endogeneity & Instrumental Variable Estimation (continued) Seen how endogeneity, Cov(x,u) 0, can be caused by Omitting (relevant) variables from the model Measurement Error in a right hand

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Appendix D: Summary of matrix algebra Basic definitions A matrix is a rectangular array of numbers, with m

### Keynesian Theory of Consumption. Theoretical and Practical Aspects. Kirill Breido, Ilona V. Tregub

Keynesian Theory of Consumption. Theoretical and Practical Aspects Kirill Breido, Ilona V. Tregub The Finance University Under The Government Of The Russian Federation International Finance Faculty, Moscow,

### Econometrics Regression Analysis with Time Series Data

Econometrics Regression Analysis with Time Series Data João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, May 2011

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### The Classical Linear Regression Model

The Classical Linear Regression Model 1 September 2004 A. A brief review of some basic concepts associated with vector random variables Let y denote an n x l vector of random variables, i.e., y = (y 1,

### Econometric Methods for Panel Data

Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

### Two-Variable Regression: Interval Estimation and Hypothesis Testing

Two-Variable Regression: Interval Estimation and Hypothesis Testing Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Confidence Intervals & Hypothesis Testing

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### A Symmetric Relationship Between Proxy and Instrumental Variables

A Symmetric Relationship Between Proxy and Instrumental Variables Damien Sheehan-Connor September 9, 2010 1 Introduction The basic problem encountered in much of applied econometrics is consistent estimation

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Regression Analysis. Pekka Tolonen

Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model

### Inference in Regression Analysis. Dr. Frank Wood

Inference in Regression Analysis Dr. Frank Wood Inference in the Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Types of Specification Errors 1. Omitted Variables. 2. Including an irrelevant variable. 3. Incorrect functional form. 2

Notes on Model Specification To go with Gujarati, Basic Econometrics, Chapter 13 Copyright - Jonathan Nagler; April 4, 1999 Attributes of a Good Econometric Model A.C. Harvey: (in Gujarati, p. 453-4) ffl

### Lecture Notes on Measurement Error

Steve Pischke Spring 2007 Lecture Notes on Measurement Error These notes summarize a variety of simple results on measurement error which I nd useful. They also provide some references where more complete

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables We often consider relationships between observed outcomes

### Testing for Serial Correlation in Fixed-Effects Panel Data Models

Testing for Serial Correlation in Fixed-Effects Panel Data Models Benjamin Born Jörg Breitung In this paper, we propose three new tests for serial correlation in the disturbances of fixed-effects panel

### In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity?

Lecture 9 Heteroskedasticity In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity? 2. What are its consequences? 3. how does one detect it? 4. What are

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

### Memento on Eviews output

Memento on Eviews output Jonathan Benchimol 10 th June 2013 Abstract This small paper, presented as a memento format, reviews almost all frequent results Eviews produces following standard econometric

### AN INTRODUCTION TO ECONOMETRICS. Oxbridge Economics; Mo Tanweer

AN INTRODUCTION TO ECONOMETRICS Oxbridge Economics; Mo Tanweer Mohammed.Tanweer@cantab.net Econometrics What is econometrics? Econometrics means economic measurement Economics + Statistics = Econometrics

### Limitations of regression analysis

Limitations of regression analysis Ragnar Nymoen Department of Economics, UiO 8 February 2009 Overview What are the limitations to regression? Simultaneous equations bias Measurement errors in explanatory

### Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS15/16

1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS15/16 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data

Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data We now turn to the analysis of time series data. One of the key assumptions underlying our analysis

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Economics 140A Identification in Simultaneous Equation Models Simultaneous Equation Models

Economics 140A Identification in Simultaneous Equation Models Simultaneous Equation Models Our second extension of the classic regression model, to which we devote two lectures, is to a system (or model)

### Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing

### Problem Set 4 ECON (Due: Friday, February 28, 2014)

Problem Set 4 ECON 633 (Due: Friday, February 8, 4) Bill Evans Spring 4. Using the psid.dta data set and the areg procedure, run a fixed-effect model of ln(wages) on tenure and union status absorbing the

### Lectures 8, 9 & 10. Multiple Regression Analysis

Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and

### Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization 1. Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 2. Minimize this by

### Three-stage least squares

Three-stage least squares Econometric Methods Lecture 7, PhD Warsaw School of Economics Outline 1 Introduction 2 3 Outline 1 Introduction 2 3 3-Stage Least Squares: basic idea 1-stage (OLS): inconsistent

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### MS&E 226: Small Data. Lecture 17: Additional topics in inference (v1) Ramesh Johari

MS&E 226: Small Data Lecture 17: Additional topics in inference (v1) Ramesh Johari ramesh.johari@stanford.edu 1 / 34 Warnings 2 / 34 Modeling assumptions: Regression Remember that most of the inference

### . This specification assumes individual specific means with Ey ( ) = µ. We know from Nickell (1981) that OLS estimates of (1.1) are biased for S S

UNIT ROOT TESTS WITH PANEL DATA. Consider the AR model y = αy ( ), it it + α µ + ε i it i =,... N, t =,..., T. (.) where the ε 2 IN(0, σ ) it. This specification assumes individual specific means with

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT

### ECON Introductory Econometrics. Lecture 17: Experiments

ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### A Guide to Modern Econometrics / 2ed. Answers to selected exercises - Chapter 4. Exercise 4.1. Variable Coefficient Std. Error t-statistic Prob.

A Guide to Modern Econometrics / 2ed Answers to selected exercises - Chapter 4 Exercise 4.1 a. The OLS results (Eviews 5.0 output) are as follows: Dependent Variable: AIRQ Date: 12/02/05 Time: 14:04 C

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### 1 Demand Estimation. Empirical Problem Set # 2 Graduate Industrial Organization, Fall 2005 Glenn Ellison and Stephen Ryan

Empirical Problem Set # 2 Graduate Industrial Organization, Fall 2005 Glenn Ellison and Stephen Ryan 1 Demand Estimation The intent of this problem set is to get you familiar with Stata, if you are not

### CHAPTER 5. Exercise Solutions

CHAPTER 5 Exercise Solutions 91 Chapter 5, Exercise Solutions, Principles of Econometrics, e 9 EXERCISE 5.1 (a) y = 1, x =, x = x * * i x i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y * i (b) (c) yx = 1, x = 16, yx

### Economics 345 Applied Econometrics

Economics 345 Applied Econometrics Lab 3: Some Useful Functional Forms for Regression Analysis Prof: Martin Farnham TAs: Joffré Leroux, Rebecca Wortzman, Yiying Yang Open EViews, and open the EViews workfile,

### CHAPTER 9: SERIAL CORRELATION

Serial correlation (or autocorrelation) is the violation of Assumption 4 (observations of the error term are uncorrelated with each other). Pure Serial Correlation This type of correlation tends to be

### Cointegration. Basic Ideas and Key results. Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board

Cointegration Basic Ideas and Key results Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

### Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL

Appendix C Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL This note provides a brief description of the statistical background, estimators

### How to use Instrumental Variables

How to use Instrumental Variables Questions that can be answered with experiments: 1) If I give people aspirin, what happens to their body temperature? 2) If I use fertilizer on tomato plants, do I get

### Lecture 16. Autocorrelation

Lecture 16. Autocorrelation In which you learn to recognise whether the residuals from your model are correlated over time, the consequences of this for OLS estimation, how to test for autocorrelation

### 6. Regression with panel data

6. Regression with panel data Key feature of this section: Up to now, analysis of data on n distinct entities at a given point of time (cross sectional data) Example: Student-performance data set Observations

### Lecture 2: Instrumental Variables

Lecture 2: Instrumental Variables Fabian Waldinger Waldinger () 1 / 45 Topics Covered in Lecture 1 Instrumental variables basics. 2 Example: IV in the return to educations literature: Angrist & Krueger

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### Heteroskedasticity and Weighted Least Squares

Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

### SIMULTANEOUS EQUATIONS MODELS

SIMULTANEOUS EQUATIONS MODELS Three-stage estimation Basic syntax has the simple structure of reg3 followed by commands specifying step by step the structural equations (depvar1 varlist1), (depvar2 varlist2)

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### Gauss-Markov Theorem. The Gauss-Markov Theorem is given in the following regression model and assumptions:

Gauss-Markov Theorem The Gauss-Markov Theorem is given in the following regression model and assumptions: The regression model y i = β 1 + β x i + u i, i = 1,, n (1) Assumptions (A) or Assumptions (B):

### Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Inference in Regression Analysis

Yang Feng Inference in the Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of

### Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### Econometrics of Panel Data

Econometrics of Panel Data 1. Basics and Examples 2. The generalized least squares estimator 3. Fixed effects model 4. Random Effects model 1 1 Basics and examples We observes variables for N units, called

### Autocorrelation. Walter Sosa-Escudero. April 23, Econ 471. Econometric Analysis. Spring 2009

Econ 471. Econometric Analysis. Spring 2009 April 23, 2009 Time-Series Observations Consider the following model Y t = β 1 + β 2 X 2t + + β K X Kt + u t,, t = 1, 2,..., T Here t denotes periods, 1,...,

### Analysis of Financial Time Series with EViews

Analysis of Financial Time Series with EViews Enrico Foscolo Contents 1 Asset Returns 2 1.1 Empirical Properties of Returns................. 2 2 Heteroskedasticity and Autocorrelation 4 2.1 Testing for

### Econometric analysis of the Belgian car market

Econometric analysis of the Belgian car market By: Prof. dr. D. Czarnitzki/ Ms. Céline Arts Tim Verheyden Introduction In contrast to typical examples from microeconomics textbooks on homogeneous goods

### ML estimation: very general method, applicable to. wide range of problems. Allows, e.g., estimation of

Maximum Likelihood (ML) Estimation ML estimation: very general method, applicable to wide range of problems. Allows, e.g., estimation of models nonlinear in parameters. ML estimator requires a distributional

### The Simple Linear Regression Model: Specification and Estimation

Chapter 3 The Simple Linear Regression Model: Specification and Estimation 3.1 An Economic Model Suppose that we are interested in studying the relationship between household income and expenditure on

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Econometrics The Multiple Regression Model: Inference

Econometrics The Multiple Regression Model: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 1 / 24 in

### Name (Please Print): Red ID:

Name (Please Print): Red ID: EC 351 : Econometrics I TEST 3 April 15, 011 100 points Instructions: Provide all answers on this exam paper. You must show all work to receive credit. You are allowed to use