Instrumental Variables & 2SLS
|
|
- Corey Pearson
- 7 years ago
- Views:
Transcription
1 Instrumental Variables & 2SLS y = β 0 + β 1 x 1 + β 2 x β k x k + u x 1 = π 0 + π 1 z + π 2 x π k x k + v
2 Why Use Instrumental Variables? Instrumental Variables (IV) estimation is used when your model has endogenous x s That is, whenever Cov(x,u) 0 Thus, IV can be used to address the problem of omitted variable bias Additionally, IV can be used to solve the classic errors-in-variables problem
3 What Is an Instrumental Variable? In order for a variable, z, to serve as a valid instrument for x, the following must be true The instrument must be exogenous That is, Cov(z,u) = 0 The instrument must be correlated with the endogenous variable x That is, Cov(z,x) 0
4 More on Valid Instruments We have to use common sense and economic theory to decide if it makes sense to assume Cov(z,u) = 0 We can test if Cov(z,x) 0 Just testing H 0 : π 1 = 0 in x = π 0 + π 1 z + v Sometimes refer to this regression as the first-stage regression
5 IV Estimation in the Simple Regression Case For y = β 0 + β 1 x + u, and given our assumptions Cov(z,y) = β 1 Cov(z,x) + Cov(z,u), so β 1 = Cov(z,y) / Cov(z,x) Then the IV estimator for β 1 is ˆβ 1 = ( z z)( y y) i i ( z z)( x x) i i
6 Inference with IV Estimation The homoskedasticity assumption in this case is E(u 2 z) = σ 2 = Var(u) As in the OLS case, given the asymptotic variance, we can estimate the standard error Var se ( ˆ β ) 1 = 2 σ nσ ρ 2 x 2 ˆ σ SST R ( ˆ β ) 1 = 2 x 2 x, z x, z
7 IV versus OLS estimation Standard error in IV case differs from OLS only in the R 2 from regressing x on z Since R 2 < 1, IV standard errors are larger However, IV is consistent, while OLS is inconsistent, when Cov(x,u) 0 The stronger the correlation between z and x, the smaller the IV standard errors
8 IV versus OLS estimation Let s think about a wage model that tries to explain how wages differ across individuals based on observable characteristics. Economic theory tells us that the wage is a function of the marginal product of the worker. So what determines this marginal product? Two factors seem to play a key role; innate ability and investment in human capital. The problem is that innate ability is not directly observable. What happens if we ignore it?
9 IV versus OLS estimation Dependent Variable: LWAGE Method: Least Squares Included observations: 428 after adjustments Variable Coefficient Std. Error t-statistic Prob. EDUC C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) The problem with the current model is that educ is likely to be correlated with error term u since factors that are not controlled for will influence educ and these factors all end up in the error. This violates the assumption E(x i u) = 0. Thus OLS estimates are biased. This is known as simultaneity or correlation bias.
10 Finding a Good IV For the log(wage) equation, an instrumental variable z for educ must be (1) uncorrelated with ability (and any other unobserved factors affecting wage) and (2) correlated with education. Something such as the last digit of an individual s Social Security Number almost certainly satisfies the first requirement: it is uncorrelated with ability because it is determined randomly. However, it is precisely because of the randomness of the last digit of the SSN that it is not correlated with education, either; therefore it makes a poor instrumental variable for educ.
11 Finding a Good IV What we have called a proxy variable for the omitted variable makes a poor IV for the opposite reason. For example, in the log(wage) example with omitted ability, a proxy variable for abil must be as highly correlated as possible with abil. An instrumental variable must be uncorrelated with abil. Therefore, while IQ is a good candidate as a proxy variable for abil, it is not a good instrumental variable for educ.
12 IV versus OLS estimation Dependent Variable: EDUC Method: Least Squares Included observations: 428 Variable Coefficient Std. Error t-statistic Prob. FATHEDUC C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) We need an instrument that can overcome this bias. The instrument needs to be correlated with educ but uncorrelated with u. One potential instrument is fatheduc. We can test the correlation between fatheduc and educ using a simple regression.
13 IV versus OLS estimation Dependent Variable: LWAGE Method: Two-Stage Least Squares Included observations: 428 after adjustments Instrument specification: FATHEDUC Constant added to instrument list Variable Coefficient Std. Error t-statistic Prob. EDUC C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic 1.02E-42 Instrument rank 2 Unfortunately we cannot test the second condition for IV estimation, i.e. E(x i u) = 0. Why not? So we must use economic theory and basic intuitive arguments to justify this condition. Using fatheduc as an instrument results in the IV estimates shown.
14 The Effect of Poor Instruments What if our assumption that Cov(z,u) = 0 is false? The IV estimator will be inconsistent, too Can compare asymptotic bias in OLS and IV Prefer IV if Corr(z,u)/Corr(z,x) < Corr(x,u) IV : plim ˆ β 1 ~ OLS: plim β = β Corr( z, u) Corr( z, x) = β + Corr( x, u) 1 σ σ u x σ σ u x
15 Weak Instruments Dependent Variable: LBWGHT Method: Least Squares Included observations: 1388 Variable Coefficient Std. Error t-statistic Prob. PACKS C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) Here we estimate the effects of smoking on birth weight. The problem is that the number of packs smoked might be correlated with other health factors not included in the regression and so it is probably correlated with the error term, u. What might be a suitable instrument. The price of cigarettes, cigprice, should be uncorrelated with the error and should be negatively correlated with consumption of packs.
16 Weak Instruments Dependent Variable: LBWGHT Method: Two-Stage Least Squares Included observations: 1388 Instrument specification: CIGPRICE Constant added to instrument list Variable Coefficient Std. Error t-statistic Prob. PACKS C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic Instrument rank 2 The IV estimates do not look so good. The sign is wrong and the R 2 is negative. * What went wrong? It may be that cigprice is a poor instrument. It may be that it is correlated with u, or it may not be correlated with packs. * - Unlike in the case of OLS, the R 2 from IV estimation can be negative because SSR for IV can be larger than SST. Although it does not hurt to report the R 2 for IV estimation, it is not very useful, either.
17 Weak Instruments Dependent Variable: PACKS Method: Least Squares Included observations: 1388 Variable Coefficient Std. Error t-statistic Prob. CIGPRICE C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) It turns out that cigprice is not significantly correlated with packs. Why? This highlights the problem of weak instruments, i.e. where the correlation between the endogenous variable, in this example packs, and the instrument(s) is low.
18 IV Estimation in the Multiple Regression Case IV estimation can be extended to the multiple regression case Call the model we are interested in estimating the structural model Our problem is that one or more of the variables are endogenous We need an instrument for each endogenous variable
19 Multiple Regression IV (cont) Write the structural model as y 1 = β 0 + β 1 y 2 + β 2 z 1 + u 1, where y 2 is endogenous and z 1 is exogenous Let z 2 be the instrument, so Cov(z 2,u 1 ) = 0 and y 2 = π 0 + π 1 z 1 + π 2 z 2 + v 2, where π 2 0 This reduced form equation regresses the endogenous variable on all exogenous ones
20 Two Stage Least Squares (2SLS) It s possible to have multiple instruments Consider our original structural model, and let y 2 = π 0 + π 1 z 1 + π 2 z 2 + π 3 z 3 + v 2 Here we re assuming that both z 2 and z 3 are valid instruments they do not appear in the structural model and are uncorrelated with the structural error term, u 1
21 Best Instrument Could use either z 2 or z 3 as an instrument The best instrument is a linear combination of all of the exogenous variables, y 2 * = π 0 + π 1 z 1 + π 2 z 2 + π 3 z 3 We can estimate y 2 * by regressing y 2 on z 1, z 2 and z 3 can call this the first stage If then substitute ŷ 2 for y 2 in the structural model, get same coefficient as IV
22 More on 2SLS While the coefficients are the same, the standard errors from doing 2SLS by hand are incorrect because of the first stage regression error. Method extends to multiple endogenous variables need to be sure that we have at least as many excluded exogenous variables (instruments) as there are endogenous variables in the structural equation
23 2SLS Dependent Variable: LWAGE Method: Least Squares Included observations: 3010 Variable Coefficient Std. Error t-statistic Prob. EDUC EXPER EXPERSQ BLACK SMSA SOUTH C Here is a wage regression using the data in CARD.RAW. Again education is endogenous (innate ability) and so requires an IV estimator. Card uses near4c as an instrument. Since the sample is geographically random the error should not be correlated with location but being near a 4-year college may be correlated with educ. R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
24 2SLS Dependent Variable: EDUC Method: Least Squares Included observations: 3010 Variable Coefficient Std. Error t-statistic Prob. NEARC EXPER EXPERSQ BLACK SMSA SOUTH C We can check for the correlation between nearc4 and educ by running the auxiliary regression of educ on nearc4 and the other exogenous variables. R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
25 2SLS Dependent Variable: LWAGE Method: Two-Stage Least Squares Included observations: 3010 Instrument specification: NEARC4 EXPER EXPERSQ BLACK SMSA SOUTH Constant added to instrument list Variable Coefficient Std. Error t-statistic Prob. EDUC EXPER EXPERSQ BLACK SMSA SOUTH C The IV estimator yields a return to educ nearly twice as large as the OLS estimator. But the standard error is 15 times larger! That is the price to be paid to get a consistent estimate of the return to educ when educ is endogenous. R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic 5.68E-35 Instrument rank 7
26 2SLS w/ Multiple Instruments Dependent Variable: LWAGE Method: Least Squares Included observations: 428 after adjustments Variable Coefficient Std. Error t-statistic Prob. EDUC EXPER EXPERSQ C Another wage equation. Again educ is endogenous. But now we have two instruments, motheduc and fatheduc. R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
27 2SLS w/ Multiple Instruments Dependent Variable: EDUC Method: Least Squares Included observations: 753 Variable Coefficient Std. Error t-statistic Prob. MOTHEDUC FATHEDUC EXPER EXPERSQ C We can check for the correlation between the instruments and educ, after contolling for the other exogenous factors of exper and exper 2, by running the auxiliary regression. R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
28 2SLS w/ Multiple Instruments Dependent Variable: LWAGE Method: Two-Stage Least Squares Included observations: 428 after adjustments Instrument specification: MOTHEDUC FATHEDUC EXPER EXPERSQ Constant added to instrument list Variable Coefficient Std. Error t-statistic Prob. Another wage equation. Again educ is endogenous. But now we have two instruments, motheduc and fatheduc. EDUC EXPER EXPERSQ C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic Instrument rank 5 Prob(J-statistic)
29 Addressing Errors-in-Variables with IV Estimation Remember the classical errors-in-variables problem where we observe x 1 instead of x 1 * Where x 1 = x 1 * + e 1, and e 1 is uncorrelated with x 1 * and x 2 If there is a z, such that Corr(z,u) = 0 and Corr(z,x 1 ) 0, then IV will remove the attenuation bias
30 Testing for Endogeneity Since OLS is preferred to IV if we do not have an endogeneity problem, then we d like to be able to test for endogeneity If we do not have endogeneity, both OLS and IV are consistent Idea of Hausman test is to see if the estimates from OLS and IV are different
31 Testing for Endogeneity (cont) While it s a good idea to see if IV and OLS have different implications, it s easier to use a regression test for endogeneity If y 2 is endogenous, then v 2 (from the reduced form equation) and u 1 from the structural model will be correlated The test is based on this observation
32 Testing for Endogeneity (cont) Save the residuals from the first stage Include the residual in the structural equation (which of course has y 2 in it) If the coefficient on the residual is statistically different from zero, reject the null of exogeneity If multiple endogenous variables, jointly test the residuals from each first stage
33 Testing for Endogeneity Dependent Variable: EDUC Method: Least Squares Included observations: 753 Variable Coefficient Std. Error t-statistic Prob. MOTHEDUC FATHEDUC EXPER EXPERSQ C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) Since OLS is efficient relative to 2SLS if there is no endogeneity we should use OLS whenever possible. Test for endogeneity by saving residuals from reduced form regression (reproduced here) and include them in the structural equation.
34 Testing for Endogeneity Dependent Variable: LWAGE Method: Least Squares Included observations: 428 after adjustments Variable Coefficient Std. Error t-statistic Prob. RESID EDUC EXPER EXPERSQ C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) The test for endogeneity is simply the test of the null hypothesis that γ 1 = 0. If there are multiple endogenous variables there will be a reduced form regression for each one. The residuals from each of these regressions would be included and then the null is a joint exclusion hypothesis.
35 Testing Overidentifying Restrictions If there is just one instrument for our endogenous variable, we can t test whether the instrument is uncorrelated with the error We say the model is just identified If we have multiple instruments, it is possible to test the overidentifying restrictions to see if some of the instruments are correlated with the error
36 The OverID Test Estimate the structural model using IV and obtain the residuals Regress the residuals on all the exogenous variables and obtain the R 2 to form nr 2 Under the null that all instruments are uncorrelated with the error, LM ~ χ q2 where q is the number of extra instruments
37 Testing Overidentifying Restrictions Dependent Variable: LWAGE Method: Two-Stage Least Squares Included observations: 428 after adjustments Instrument specification: MOTHEDUC FATHEDUC EXPER EXPERSQ Constant added to instrument list Variable Coefficient Std. Error t-statistic Prob. EDUC EXPER EXPERSQ C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic Instrument rank 5 Prob(J-statistic) Here are the 2SLS estimates of the structural wage equation using motheduc and fatheduc as instruments. Since we have only one endogenous variable but two instruments the model is overidentified. First save the residuals from the structural equation. Second, regress the residuals on the exogenous variables and save R 2. Test is LM(p) = nr 2 where p is the number of over-identifying restrictions.
38 Testing Overidentifying Restrictions Dependent Variable: RESID02 Method: Least Squares Included observations: 428 after adjustments Variable Coefficient Std. Error t-statistic Prob. EXPER -1.83E EXPERSQ 7.34E MOTHEDUC FATHEDUC C R 2 = n = 428 LM = ~ χ 2 (1) R-squared Mean dependent var -4.07E-16 Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
39 Simultaneous Equations y 1 = α 1 y 2 + β 1 z 1 + u 1 y 2 = α 2 y 1 + β 2 z 2 + u 2
40 Simultaneity Simultaneity is a specific type of endogeneity problem in which the explanatory variable is jointly determined with the dependent variable As with other types of endogeneity, IV estimation can solve the problem Some special issues to consider with simultaneous equations models (SEM)
41 Supply and Demand Example Start with an equation you d like to estimate, say a labor supply function h s = α 1 w + β 1 z + u 1, where w is the wage and z is a supply shifter Call this a structural equation it s derived from economic theory and has a causal interpretation where w directly affects h s
42 Example (cont) Problem that can t just regress observed hours on wage, since observed hours are determined by the equilibrium of supply and demand Consider a second structural equation, in this case the labor demand function h d = α 2 w + u 2 So hours are determined by a SEM
43 Example (cont) Both h and w are endogenous because they are both determined by the equilibrium of supply and demand z is exogenous, and it s the availability of this exogenous supply shifter that allows us to identify the structural demand equation With no observed demand shifters, supply is not identified and cannot be estimated
44 Identification of Demand Equation w D S (z=z1) S (z=z2) S (z=z3) h
45 Using IV to Estimate Demand So, we can estimate the structural demand equation, using z as an instrument for w First stage equation is w = π 0 + π 1 z + v 2 Second stage equation is h = α 2 ŵ + u 2 Thus, 2SLS provides a consistent estimator of α 2, the slope of the demand curve We cannot estimate α 1, the slope of the supply curve
46 The General SEM Suppose you want to estimate the structural equation: y 1 = α 1 y 2 + β 1 z 1 + u 1 where, y 2 = α 2 y 1 + β 2 z 2 + u 2 Thus, y 2 = α 2 (α 1 y 2 + β 1 z 1 + u 1 ) + β 2 z 2 + u 2 So, (1 α 2 α 1 )y 2 = α 2 β 1 z 1 + β 2 z 2 + α 2 u 1 + u 2, which can be rewritten as y 2 = π 1 z 1 + π 2 z 2 + v 2
47 The General SEM (continued) By substituting this reduced form in for y 2, we can see that since v 2 is a linear function of u 1, y 2 is correlated with the error term and α 1 is biased call it simultaneity bias The sign of the bias is complicated, but can use the simple regression as a rule of thumb In the simple regression case, the bias is the same sign as α 2 /(1 α 2 α 1 )
48 Identification of General SEM Let z 1 be all the exogenous variables in the first equation, and z 2 be all the exogenous variables in the second equation It s okay for there to be overlap in z 1 and z 2 To identify equation 1, there must be some variables in z 2 that are not in z 1 To identify equation 2, there must be some variables in z 1 that are not in z 2
49 Rank and Order Conditions We refer to this as the rank condition Note that the exogenous variable excluded from the first equation must have a non-zero coefficient in the second equation for the rank condition to hold Note that the order condition clearly holds if the rank condition does there will be an exogenous variable for the endogenous one
50 Estimation of the General SEM Estimation of SEM is straightforward The instruments for 2SLS are the exogenous variables from both equations Can extend the idea to systems with more than 2 equations For a given identified equation, the instruments are all of the exogenous variables in the whole system
51 Estimation of the General SEM The first equation is married women s labor supply. The second is the wage offer as a function of productivity measures.
52 Estimation of the General SEM Dependent Variable: HOURS Method: Least Squares Included observations: 428 after adjustments Variable Coefficient Std. Error t-statistic Prob. LWAGE AGE EDUC KIDSLT NWIFEINC C Here is the OLS estimate of the labor supply equation. Of particular interest is the coefficient on the log(wage). It is negative ( labor supply?) and very imprecisely estimated (large s.e.). R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 2.48E+08 Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)
53 Estimation of the General SEM Dependent Variable: HOURS Method: Two-Stage Least Squares Included observations: 428 after adjustments Instrument specification: EXPER EXPERSQ AGE EDUC KIDSLT NWIFEINC C Variable Coefficient Std. Error t-statistic Prob. LWAGE AGE EDUC KIDSLT NWIFEINC C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid 7.74E+08 F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR 2.26E+08 J-statistic Instrument rank 7 Prob(J-statistic) Now the 2SLS estimates. Note that for this equation the instruments include the exogenous variables AGE, EDUC, KIDSLT and NWIFEINC as well as the two excluded variables EXPER and EXPERSQ. These last two are included in the labor demand (wage) equation but excluded from the labor supply specification. That is what makes then (over-) identifying. Test of over-identifying restrictions is the J-stat.
54 Estimation of the General SEM Dependent Variable: LWAGE Method: Two-Stage Least Squares Included observations: 428 after adjustments Instrument specification: EXPER EXPERSQ AGE EDUC KIDSLT6 NWIFEINC C Variable Coefficient Std. Error t-statistic Prob. HOURS EDUC EXPER EXPERSQ C R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Sum squared resid F-statistic Durbin-Watson stat Prob(F-statistic) Second-Stage SSR J-statistic Instrument rank 7 Prob(J-statistic) Here is the 2SLS estimates for the labor demand equation. This equation uses the same instruments included in the supply equation. Since there are three excluded variables from this equation (AGE, KIDSLT6 and NWIFEINC) there are two over-id restrictions.
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationSolución del Examen Tipo: 1
Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationproblem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More informationForecasting the US Dollar / Euro Exchange rate Using ARMA Models
Forecasting the US Dollar / Euro Exchange rate Using ARMA Models LIUWEI (9906360) - 1 - ABSTRACT...3 1. INTRODUCTION...4 2. DATA ANALYSIS...5 2.1 Stationary estimation...5 2.2 Dickey-Fuller Test...6 3.
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationFinancial Risk Management Exam Sample Questions/Answers
Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period
More informationWooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
More informationUK GDP is the best predictor of UK GDP, literally.
UK GDP IS THE BEST PREDICTOR OF UK GDP, LITERALLY ERIK BRITTON AND DANNY GABAY 6 NOVEMBER 2009 UK GDP is the best predictor of UK GDP, literally. The ONS s preliminary estimate of UK GDP for the third
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationIMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD
REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More information16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationCorrelated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
More informationOn the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina
On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina car@cema.edu.ar www.cema.edu.ar\~car Version1-February 14,2000 All data can be consulted
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationSIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationHypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More information1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).
Examples of Questions on Regression Analysis: 1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Then,. When
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationMgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side
Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right
More informationChapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
More informationEconometric analysis of the Belgian car market
Econometric analysis of the Belgian car market By: Prof. dr. D. Czarnitzki/ Ms. Céline Arts Tim Verheyden Introduction In contrast to typical examples from microeconomics textbooks on homogeneous goods
More informationAn Introduction to Path Analysis. nach 3
An Introduction to Path Analysis Developed by Sewall Wright, path analysis is a method employed to determine whether or not a multivariate set of nonexperimental data fits well with a particular (a priori)
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationChapter 9 Assessing Studies Based on Multiple Regression
Chapter 9 Assessing Studies Based on Multiple Regression Solutions to Empirical Exercises 1. Age 0.439** (0.030) Age 2 Data from 2004 (1) (2) (3) (4) (5) (6) (7) (8) Dependent Variable AHE ln(ahe) ln(ahe)
More informationPredicting The Outcome Of NASCAR Races: The Role Of Driver Experience Mary Allender, University of Portland
Predicting The Outcome Of NASCAR Races: The Role Of Driver Experience Mary Allender, University of Portland ABSTRACT As national interest in NASCAR grows, the field of sports economics is increasingly
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationThe Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran
The Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran Shahram Gilaninia 1, Hosein Ganjinia, Azadeh Asadian 3 * 1. Department of Industrial Management, Islamic Azad University,
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationDeterminants of Stock Market Performance in Pakistan
Determinants of Stock Market Performance in Pakistan Mehwish Zafar Sr. Lecturer Bahria University, Karachi campus Abstract Stock market performance, economic and political condition of a country is interrelated
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationCoefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationThe Effect of Housing on Portfolio Choice. July 2009
The Effect of Housing on Portfolio Choice Raj Chetty Harvard Univ. Adam Szeidl UC-Berkeley July 2009 Introduction How does homeownership affect financial portfolios? Linkages between housing and financial
More informationCompetition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities
Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities Corina ŞERBAN 1 ABSTRACT Nowadays, social marketing practices represent an important
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationMULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis
Journal of tourism [No. 8] MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM Assistant Ph.D. Erika KULCSÁR Babeş Bolyai University of Cluj Napoca, Romania Abstract This paper analysis
More informationEuropean Journal of Business and Management ISSN 2222-1905 (Paper) ISSN 2222-2839 (Online) Vol.5, No.30, 2013
The Impact of Stock Market Liquidity on Economic Growth in Jordan Shatha Abdul-Khaliq Assistant Professor,AlBlqa Applied University, Jordan * E-mail of the corresponding author: yshatha@gmail.com Abstract
More informationCAPM, Arbitrage, and Linear Factor Models
CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors
More informationTesting for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.
More informationThe relationship between stock market parameters and interbank lending market: an empirical evidence
Magomet Yandiev Associate Professor, Department of Economics, Lomonosov Moscow State University mag2097@mail.ru Alexander Pakhalov, PG student, Department of Economics, Lomonosov Moscow State University
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationAir passenger departures forecast models A technical note
Ministry of Transport Air passenger departures forecast models A technical note By Haobo Wang Financial, Economic and Statistical Analysis Page 1 of 15 1. Introduction Sine 1999, the Ministry of Business,
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More information6/15/2005 7:54 PM. Affirmative Action s Affirmative Actions: A Reply to Sander
Reply Affirmative Action s Affirmative Actions: A Reply to Sander Daniel E. Ho I am grateful to Professor Sander for his interest in my work and his willingness to pursue a valid answer to the critical
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationEmpirical Methods in Applied Economics
Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationUsing instrumental variables techniques in economics and finance
Using instrumental variables techniques in economics and finance Christopher F Baum 1 Boston College and DIW Berlin German Stata Users Group Meeting, Berlin, June 2008 1 Thanks to Mark Schaffer for a number
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationAn Introduction to Regression Analysis
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
More informationESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*
JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 21: 543 547 (2006) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.861 ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL
More informationEconometrics Problem Set #2
Econometrics Problem Set #2 Nathaniel Higgins nhiggins@jhu.edu Assignment The homework assignment was to read chapter 2 and hand in answers to the following problems at the end of the chapter: 2.1 2.5
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables
Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables We often consider relationships between observed outcomes
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More informationTime Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
More information