Advanced Research Methods. Instrumental variables (IV) Regression discontinuity design (RDD)

Similar documents
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Difference in differences and Regression Discontinuity Design

Imbens/Wooldridge, Lecture Notes 5, Summer 07 1

Regression Discontinuity Marginal Threshold Treatment Effects

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

How Would One Extra Year of High School Affect Academic Performance in University? Evidence from a Unique Policy Change

The Effect of Health Insurance Coverage on the Reported Health of Young Adults

Econometrics Simple Linear Regression

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

1 if 1 x 0 1 if 0 x 1

T-test & factor analysis

PS 271B: Quantitative Methods II. Lecture Notes

Microeconomic Theory: Basic Math Concepts

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

AN INTRODUCTION TO MATCHING METHODS FOR CAUSAL INFERENCE

PhD Thesis Prospectus. "Health Insurance, Preventative Health Behaviour, and Universal Childcare" Overview of Research Papers

Introduction to General and Generalized Linear Models

The Effect of Health Insurance Coverage on the Use of Medical Services *

MULTIVARIATE PROBABILITY DISTRIBUTIONS

The Effect of Unemployment Benefits and Nonemployment Durations on Wages

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Gabrielle Fack Paris School of Economics and Université Paris 1. Julien Grenet Paris School of Economics. July 2014

Regression III: Advanced Methods

Linear and quadratic Taylor polynomials for functions of several variables.

CS&SS / STAT 566 CAUSAL MODELING

Local classification and local likelihoods

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

6.1 Add & Subtract Polynomial Expression & Functions

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011

ESTIMATING THE EFFECT OF FINANCIAL AID OFFERS ON COLLEGE ENROLLMENT: A REGRESSION DISCONTINUITY APPROACH

Quantile Regression under misspecification, with an application to the U.S. wage structure

Recursive Estimation

Lecture 10. Finite difference and finite element methods. Option pricing Sensitivity analysis Numerical examples

All you need is LATE Primary Job Market Paper

Additional sources Compilation of sources:

Propensity scores for the estimation of average treatment effects in observational studies

Chapter 9 Assessing Studies Based on Multiple Regression

Introduction to mixed model and missing data issues in longitudinal studies

Implementing Propensity Score Matching Estimators with STATA

Reject Inference in Credit Scoring. Jie-Men Mok

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. Description

H/wk 13, Solutions to selected problems

Retirement routes and economic incentives to retire: a cross-country estimation approach Martin Rasmussen

THE IMPACT OF 401(K) PARTICIPATION ON THE WEALTH DISTRIBUTION: AN INSTRUMENTAL QUANTILE REGRESSION ANALYSIS

Financial capability and saving: Evidence from the British Household Panel Survey

TREATMENT EFFECT HETEROGENEITY IN THEORY AND PRACTICE*

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Section 3 Part 1. Relationships between two numerical variables

MACHINE LEARNING IN HIGH ENERGY PHYSICS

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Supervised and unsupervised learning - 1

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Solución del Examen Tipo: 1

Magne Mogstad and Matthew Wiswall

Smoothing and Non-Parametric Regression

Please Call Again: Correcting Non-Response Bias in Treatment Effect Models

Transcription:

Instrumental variables (IV) Regression discontinuity design (RDD) Lecture 2

INSTRUMENTAL VARIABLES (IV)

EXAMPLE Consider once more your favourite training program for unemployed workers. Assume this time that the offer of training is randomized, while take-up is endogenous. Furthermore assume that only unemployed workers who have been offered training can participate. Then receiving the offer of training is a valid instrument for take-up of training.

NOTATION Z {0, 1}... binary instrument: offer yes/no D {0, 1}... treatment status: take-up yes/no Y 0 Y 1... potential outcome under no treatment... potential outcome under treatment Y... observed outcome

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Monotonicity or instrument relevance: P r(d i = 1 Z = 1) P r(d i = 1 Z = 0) i and > for some i or P r(d i = 1 Z = 1) P r(d i = 1 Z = 0) i and < for some i A3 Exclusion restriction or instrument exogeneity: E[Y1 Z = 1] = E[Y 1 Z = 0] = E[Y 1 ] E[Y0 Z = 1] = E[Y 0 Z = 0] = E[Y 0 ] May be required conditional on X only.

POTENTIAL EFFECTS Never-takers (τ = n): P r(d i = 1 Z = 1) = P r(d i = 1 Z = 0) = 0 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y0,i ] E[Y 0,i ] = 0 Always-takers (τ = a): P r(d i = 1 Z = 1) = P r(d i = 1 Z = 0) = 1 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y1,i ] E[Y 1,i ] = 0 Compliers (τ = c): P r(d i = 1 Z = 1) = 1 > P r(d i = 1 Z = 0) = 0 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y1,i ] E[Y 0,i ] Defiers (τ = d) ruled out by monotonicity: P r(d i = 1 Z = 1) = 0 < P r(d i = 1 Z = 0) = 1 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y0,i ] E[Y 1,i ] Note that P r(d = 1 Z = 1) = P r(τ = a) + P r(τ = c) and P r(d = 1 Z = 0) = P r(τ = a) + P r(τ = d) = P r(τ = a).

Assume you estimate the reduced-form difference in average outcomes for those who get the offer and those who do not get the offer, which resembles something like an intention-to-treat effect. What does this identify? E[Y Z = 1] E[Y Z = 0] = (E[Y Z = 1, τ = n] E[Y Z = 0, τ = n]) P r(τ = n) }{{} 0 + (E[Y Z = 1, τ = a] E[Y Z = 0, τ = a]) P r(τ = a) }{{} 0 +(E[Y Z = 1, τ = c] E[Y Z = 0, τ = c]) P r(τ = c) +(E[Y Z = 1, τ = d] E[Y Z = 0, τ = d]) P r(τ = d) }{{} 0 = (E[Y Z = 1, τ = c] E[Y Z = 0, τ = c]) P r(τ = c) = E[Y1 Y 0 τ = c] P r(τ = c)

NONPARAMETRIC IDENTIFICATION E[Y1 Y 0 τ = c] = = = = E[Y Z = 1] E[Y Z = 0] P r(τ = c) E[Y Z = 1] E[Y Z = 0] P r(τ = a) + P r(τ = c) P r(τ = a) E[Y Z = 1] E[Y Z = 0] P r(d = 1 Z = 1) P r(d = 1 Z = 0) E[Y Z = 1] E[Y Z = 0] E[D Z = 1] E[D Z = 0] Identifies the so-called local average treatment effect (LATE), which is the effect for the compliers (those who respond to the instrument). If effect not constant, only for this population. Policy relevant?

ESTIMATION E[Y Z = 1] E[Y Z = 0] E[D Z = 1] E[D Z = 0] Each component can be estimated nonparametrically by the cell average. If A2 and/or A3 only hold conditional on X, estimate in cells defined by Z and X or use propensity score methods for high-dimensional X (see Frölich 2007). Note that E[D Z = z, X = x] = P r(d = 1 Z = z, X = x), which can be estimated using a probit model for P r(d = 1 X = x) within the sub-sample with Z = z. Estimate the other components E[Y Z = z] = E[Y p(x, z)]f X Z=z (x)dx where p(x, z) P r(d = 1 X = x, Z = z). Note that different instruments imply different complier populations: estimated effects may differ. Thus, different effects are no evidence against validity of the instrument: identifying assumption that is not testable unless you assume effect homogeneity.

RECOMMENDED READINGS (other than surveys): Imbens, G.W. and J.D. Angrist (1994). Identification and Estimation of Local Average Treatment Effects, Econometrica, 62(2), 467-475. Frölich, M. (2007). Nonparametric IV Estimation of Local Average Treatment Effects with Covariates, Journal of Econometrics, 139(1) 35-75. GOOD APPLICATIONS: Frölich, M. and M. Lechner (2014). Combining Matching and Nonparametric IV Estimation: Theory and an Application to the Evaluation of Active Labour Market Policies, Journal of Applied Econometrics, DOI: 10.1002/jae.2417.

REGRESSION DISCONTINUITY DESIGN (RDD)

INTRODUCTION You want to estimate the effect of the generosity of unemployment insurance (UI) on unemployment duration. 1. Unemployed workers who are 50 or older at the time of becoming unemployed (D = 1) are eligible for longer maximum UI benefit durations than younger unemployed workers (D = 0). 2. You have data for a large sample of unemployed workers and you observe the exact date of birth and the exact date when workers have become unemployed. RDD is based on the idea that workers who had turned 50 just before becoming unemployed are essentially identical to workers who had turned 50 right after becoming unemployed. Hence, the latter can be used as a control group to estimate the effect of interest. Crossing the age threshold can be regarded as a locally valid instrument for maximum UI benefit duration.

GENERAL SETUP Interest in effect of some intervention on some outcome Y. Institutional rules imply that treatment probability jumps at cut-off value x of some quasi-continuous covariate x. x is called the assignment, running or forcing variable. Sharp RDD: Cut-off is strictly enforced and everyone at one side of the cut-off is subject to the intervention and everyone on the other side is not. Fuzzy RDD: There are persons subject to the intervention on both sides of the cut-off but the probability of being subject to the intervention jumps at the cut-off.

GENERAL SETUP

SHARP RDD D i = D(X i ) = 1(X i x) P r(d i = 1 X i < x) = 0 P r(d i = 1 X i x) = 1 Note: no overlap in X i (no common support) between treated and nontreated.

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Local continuity (LC): E[Y 0,i X i = x] and E[Y 1,i X i = x] are continuous in x at x

SUTVA and local continuity

IDENTIFICATION E[Y 1,i Y 0,i X i = x] = lim x x E[Y i X i = x] lim x x E[Y i X i = x] (1) Identifies effect at the threshold x: if effect not constant, only for this population (but highly policy relevant).

What if the forcing variable is discrete? Need to choose a functional form for the relationship between the treatment variable and the outcomes of interest.

What if the forcing variable is discrete? Specification errors can lead to biased results.

PARAMETRIC ESTIMATION Without covariates: Y i = α + θd i + U i (2) Allowing for direct effect of assignment variable on Y i : Y i = α + θd i + β 0 (X i x) + β 1 D i (X i x) + U i (3) Include higher order polynomials of (X i x) to relax functional form assumption: P P Y i = α + θd i + β 0,p (X i x) p + β 1,p D i (X i x) p + U i (4) p=1 p=1 Include other covariates X to increase precision: K Y i = α + θd i + β 0 (X i x) + β 1 D i (X i x) + β 2,k Xk,i + U i (5) k=1 θ is the parameter of interest.

PARAMETRIC ESTIMATION If RDD is valid, estimating (2) using observations in a very small neighborhood around x is sufficient. If observations further away from x are used, controlling for the direct effects of the assignment variable is crucial to avoid bias. If all observations are used, global continuity is assumed and using the correct functional form for the direct effect of the assignment variable is crucial. Controlling for covariates that are correlated with the potential outcomes may improve precision because residuals become smaller. Choosing the caliper around the cutoff is a tradeoff between efficiency (using more observations to increase precision) and consistency (getting the functional form of the direct effect right).

NONPARAMETRIC ESTIMATION Standard kernel using all observations: N i=1 K h(x i x)y i D N i i=1 N i=1 K K h(x i x)y i (1 D i ) h(x i x)d N i i=1 K h(x i x)(1 D i ) (6) Using only observations in neighborhood of x: i {i: x h X i x+h} Y id i i {i: x h X i {i: x h X i x+h} D i x+h} Y i(1 D i ) i i {i: x h X i x+h} (1 D i) (7) But convergence rates can be bad at boundary x.

NONPARAMETRIC ESTIMATION Local linear regression using all observations: N min K h (X i x)[y i α θd i β 0 (X i x) β 1 D i (X i x)] 2 (8) i=1 Using only observations in neighborhood of x: N min 1( x h X i x + h)[y i α θd i β 0 (X i x) β 1 D i (X i x)] 2 (9) i=1

FUZZY RDD lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) 0 < P r(d i = 1 X i = x) < 1 Note: overlap in X i (common support) between treated and nontreated.

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Local continuity (LC): E[Y 0,i X i = x] and E[Y 1,i X i = x] are continuous in x at x A3 Local monotonicity (LM): lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) i and > for some i or lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) i and > for some i This can be regarded as a local IV.

FUZZY RDD

FUZZY RDD Never-takers (τ = n): P r(d i = 1 X i < x) = P r(d i = 1 X i x) = 0 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = 0 Always-takers (τ = a): P r(d i = 1 X i < x) = P r(d i = 1 X i x) = 1 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = 0 Compliers (τ = c): P r(d i = 1 X i < x) = 0 < P r(d i = 1 X i x) = 1 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = E[Y1,i Y 0,i X i = x] Defiers (τ = d) ruled out by local monotonicity: P r(d i = 1 X i < x) = 1 > P r(d i = 1 X i x) = 0 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = E[Y0,i Y 1,i X i = x]

IDENTIFICATION lim E[Y i X i = x] lim E[Y i X i = x] (10) x x x x = 0 P r(τ = n X i = x) + 0 P r(τ = a X i = x) +E[Y 1,i Y 0,i X i = x, τ = c] P r(τ = c X i = x) +E[Y 0,i Y 1,i X i = x, τ = d] 0 = E[Y 1,i Y 0,i X i = x, τ = c] P r(τ = c X i = x)

IDENTIFICATION E[Y 1,i Y 0,i X i = x, τ = c] (11) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] P r(τ = c X i = x) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] lim x x E[D i X i = x] lim x x E[D i X i = x] Identifies effect for compliers at the threshold x: if effect not constant, only for this population (but highly policy relevant).

IDENTIFICATION Let Z i 1(X i x). E[Y 1,i Y 0,i X i = x, τ = c] (12) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] lim x x E[D i X i = x] lim x x E[D i X i = x] = lim x x E[Y i X i = x, Z i = 1] lim x x E[Y i X i = x, Z i = 0] lim x x E[D i X i = x, Z i = 1] lim x x E[D i X i = x, Z i = 0] This looks very much like LATE with the threshold Z i 1(X i x) as instrument. But the instrument Z i is only valid locally at the threshold x.

PARAMETRIC ESTIMATION (2SLS) First stage: E(D i X i = x) = P r(d i = 1 X i = x) = f(x i x) + γ1(x i x) (13) Second stage: Y i = α + θp r(d i = 1 X i = x) + g(x i x) + U i (14) which is estimated by replacing P r(d i = 1 X i = x) with P r(d i = 1 X i = x) obtained from the first stage. This is essentially IV with the threshold Z i 1(X i x) as instrument.

PARAMETRIC ESTIMATION (2SLS) if sample restricted, local effect estimated if sample not restricted, all data points (even far from x) used assumes global continuity and global monotonicity add functions of (X i x) (e.g. higher order polynomials) and covariates for same reasons as before

NONPARAMETRIC ESTIMATION Standard kernel using all observations: Ni=1 K h (X i x)y i Z i Ni=1 K h (X i x)z i Ni=1 K h (X i x)d i Z i Ni=1 K h (X i x)z i Ni=1 K h (X i x)y i (1 Z i ) Ni=1 K h (X i x)(1 Z i ) Ni=1 K h (X i x)d i (1 Z i ) Ni=1 K h (X i x)(1 Z i ) (15) Using only observations in neighborhood of x: i {i: x h X i x+h} Y iz i i {i: x h X i x+h} Z i i {i: x h X i x+h} D iz i i {i: x h X i x+h} Z i i {i: x h X i x+h} Y i(1 Z i ) i {i: x h X i x+h} (1 Z i) i {i: x h X i x+h} D i(1 Z i ) i {i: x h X i x+h} (1 Z i) (16) But convergence rates can be bad at boundary x.

NONPARAMETRIC ESTIMATION Local linear regression using all observations: min N i=1 K h(x i x)[y i β 0 β 1 Z i β 2 (X i x) β 3 Z i (X i x)] 2 min N i=1 K h(x i x)[d i α 0 α 1 Z i α 2 (X i x) α 3 Z i (X i x)] 2 Using only observations in neighborhood of x: min N i=1 1( x h X i x + h)[y i β 0 β 1 Z i β 2 (X i x) β 3 Z i (X i x)] 2 min N i=1 1( x h X i x + h)[d i α 0 α 1 Z i α 2 (X i x) α 3 Z i (X i x)] 2

NONPARAMETRIC ESTIMATION add more flexible specifications like higher order polynomials include other covariates restrict sample around x vary bandwidth or try alternative estimators which are more appropriate at boundary points

Lalive, R. (2008): How Do Extended Benefits Affect Unemployment Duration? A Regression Discontinuity Approach, Journal of Econometrics, 142(2), 785-806. Austrian UI since August 1989: up to age 39: 30 weeks if employed 3 out of past 5 years age 40-49: 39 weeks if employed 6 out of past 10 years from age 50: 52 weeks (1 year) if employed 9 out of past 15 years July 1988-August 1993: 209 weeks (4 years) from age 50 if employed 15 out of past 25 years resident of selected region for at least 6 months new unemployment spell after June 1988 or ongoing spell in June 1988

Interactions with other policies: statutory retirement age: 60/65 for women/men early retirement age: 55/60 if worked for at least 35 years special income support: - age 54/59 and employed 15 out of past 25 years - for one year - min(1.25*ub, pension benefit) from age 50 women are covered until early retirement

Sample definition entries into unemployment from non-steel sector 1/1986-12/1987 (pre REBP), 8/1989-7/1991 (REBP) ratio of actual to potential work experience since 1972 of at least 0.7 to ensure eligibility age 46-53 at beginning of unemployment no farther than 70 min car drive from border Vienna excluded social insurance records full population exact date of birth and begining of unemployment observed

Discontinuity at age 50: men

Discontinuity at region: men

Validity of RDD: men

Validity of RDD: men

Validity of RDD: men

Discontinuity at age 50: women

Discontinuity at region: women

Validity of RDD: women

Sharp RDD (1) Baseline: (2) Linear regression: Y i = α 0 + α 1 D i + υ i Y i = α 0 + α 1 D i + β 0 (S i S 0 ) + β 1 D i (S i S 0 ) + ε i (4) Local linear regression: min α 0,α 1,β 0,β 1 i=1 N [Y i α 0 α 1 D i β 0 (S i S 0 ) β 1 D i (S i S 0 )] 2 K h (S i S 0 ) where K h ( ) is the Epanechnikov kernel

Sharp RDD Variants: (3) quadratic and cubic terms in (S i S 0 ) (5) control for pre-reform differences (BD-RDD): include pre-reform observations and estimate fully interacted model with period indicator (6) include additional covariates

Results: men

Results: women

Results: women

A RECIPE existence of discontinuity: plot D i against X i descriptive evidence of effect: plot Y i against X i no sorting in X i : plot density of X i no sorting in X i : test for discontinuities in density no sorting in covariates: plot covariates against X i no sorting in covariates: covariates as outcome no sorting in covariates: test for discontinuities repeat estimation in period without treatment test for jumps at non-discontinuity points

A RECIPE present both paramteric and non-parametric estimates with and without different polynomials of X i x with and without other covariates vary sample around x comparisons to estimates based on unconfoundedness for fuzzy RDD make sure no other policies use same threshold if there are other policies, a combined RDD and DiD might be an option if assigment variable is not continuous: need parametric function to get rid of direct effect

SUMMARY: Exploit institutional features for credible identification. Usually much more credible than standard IV (and other strategies). Biggest threats to validity are sorting and other policies at same threshold. Large toolkit to assess internal validity empirically.

RECOMMENDED READINGS (other than surveys): Imbens, G. and T. Lemieuz (2008). Regression Discontinuity Designs: A Guide to Practice, Journal of Econometrics, 142(2), 615-635. Lee, D.S. and D. Card (2008). Regression Discontinuity Inference with Specification Error, Journal of Econometrics, 142(2), 655-674. McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test, Journal of Econometrics, 142(2), 698-714.

GOOD APPLICATIONS: all applications in the Journal of Econometrics special issue on RDD 142(2), 2008 Lee, D.S. and J. McCrary (2009). The Deterrence Effect of Prison: Dynamic Theory and Evidence, Working Paper, Princeton University, Department of Economics, Center for Economic Policy Studies. Gormley, W.T., T. Gayer, D. Phillips, and B. Dawson (2005). The Effects of Universal Pre-K on Cognitive Development, Developmental Psychology, 41(6), 872-884.