Advanced Research Methods. Instrumental variables (IV) Regression discontinuity design (RDD)

Instrumental variables (IV) Regression discontinuity design (RDD) Lecture 2

INSTRUMENTAL VARIABLES (IV)

EXAMPLE Consider once more your favourite training program for unemployed workers. Assume this time that the offer of training is randomized, while take-up is endogenous. Furthermore assume that only unemployed workers who have been offered training can participate. Then receiving the offer of training is a valid instrument for take-up of training.

NOTATION Z {0, 1}... binary instrument: offer yes/no D {0, 1}... treatment status: take-up yes/no Y 0 Y 1... potential outcome under no treatment... potential outcome under treatment Y... observed outcome

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Monotonicity or instrument relevance: P r(d i = 1 Z = 1) P r(d i = 1 Z = 0) i and > for some i or P r(d i = 1 Z = 1) P r(d i = 1 Z = 0) i and < for some i A3 Exclusion restriction or instrument exogeneity: E[Y1 Z = 1] = E[Y 1 Z = 0] = E[Y 1 ] E[Y0 Z = 1] = E[Y 0 Z = 0] = E[Y 0 ] May be required conditional on X only.

POTENTIAL EFFECTS Never-takers (τ = n): P r(d i = 1 Z = 1) = P r(d i = 1 Z = 0) = 0 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y0,i ] E[Y 0,i ] = 0 Always-takers (τ = a): P r(d i = 1 Z = 1) = P r(d i = 1 Z = 0) = 1 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y1,i ] E[Y 1,i ] = 0 Compliers (τ = c): P r(d i = 1 Z = 1) = 1 > P r(d i = 1 Z = 0) = 0 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y1,i ] E[Y 0,i ] Defiers (τ = d) ruled out by monotonicity: P r(d i = 1 Z = 1) = 0 < P r(d i = 1 Z = 0) = 1 E[Y i Z = 1] E[Y i Z = 0] A1,A3 = E[Y0,i ] E[Y 1,i ] Note that P r(d = 1 Z = 1) = P r(τ = a) + P r(τ = c) and P r(d = 1 Z = 0) = P r(τ = a) + P r(τ = d) = P r(τ = a).

Assume you estimate the reduced-form difference in average outcomes for those who get the offer and those who do not get the offer, which resembles something like an intention-to-treat effect. What does this identify? E[Y Z = 1] E[Y Z = 0] = (E[Y Z = 1, τ = n] E[Y Z = 0, τ = n]) P r(τ = n) }{{} 0 + (E[Y Z = 1, τ = a] E[Y Z = 0, τ = a]) P r(τ = a) }{{} 0 +(E[Y Z = 1, τ = c] E[Y Z = 0, τ = c]) P r(τ = c) +(E[Y Z = 1, τ = d] E[Y Z = 0, τ = d]) P r(τ = d) }{{} 0 = (E[Y Z = 1, τ = c] E[Y Z = 0, τ = c]) P r(τ = c) = E[Y1 Y 0 τ = c] P r(τ = c)

NONPARAMETRIC IDENTIFICATION E[Y1 Y 0 τ = c] = = = = E[Y Z = 1] E[Y Z = 0] P r(τ = c) E[Y Z = 1] E[Y Z = 0] P r(τ = a) + P r(τ = c) P r(τ = a) E[Y Z = 1] E[Y Z = 0] P r(d = 1 Z = 1) P r(d = 1 Z = 0) E[Y Z = 1] E[Y Z = 0] E[D Z = 1] E[D Z = 0] Identifies the so-called local average treatment effect (LATE), which is the effect for the compliers (those who respond to the instrument). If effect not constant, only for this population. Policy relevant?

ESTIMATION E[Y Z = 1] E[Y Z = 0] E[D Z = 1] E[D Z = 0] Each component can be estimated nonparametrically by the cell average. If A2 and/or A3 only hold conditional on X, estimate in cells defined by Z and X or use propensity score methods for high-dimensional X (see Frölich 2007). Note that E[D Z = z, X = x] = P r(d = 1 Z = z, X = x), which can be estimated using a probit model for P r(d = 1 X = x) within the sub-sample with Z = z. Estimate the other components E[Y Z = z] = E[Y p(x, z)]f X Z=z (x)dx where p(x, z) P r(d = 1 X = x, Z = z). Note that different instruments imply different complier populations: estimated effects may differ. Thus, different effects are no evidence against validity of the instrument: identifying assumption that is not testable unless you assume effect homogeneity.

RECOMMENDED READINGS (other than surveys): Imbens, G.W. and J.D. Angrist (1994). Identification and Estimation of Local Average Treatment Effects, Econometrica, 62(2), 467-475. Frölich, M. (2007). Nonparametric IV Estimation of Local Average Treatment Effects with Covariates, Journal of Econometrics, 139(1) 35-75. GOOD APPLICATIONS: Frölich, M. and M. Lechner (2014). Combining Matching and Nonparametric IV Estimation: Theory and an Application to the Evaluation of Active Labour Market Policies, Journal of Applied Econometrics, DOI: 10.1002/jae.2417.

REGRESSION DISCONTINUITY DESIGN (RDD)

INTRODUCTION You want to estimate the effect of the generosity of unemployment insurance (UI) on unemployment duration. 1. Unemployed workers who are 50 or older at the time of becoming unemployed (D = 1) are eligible for longer maximum UI benefit durations than younger unemployed workers (D = 0). 2. You have data for a large sample of unemployed workers and you observe the exact date of birth and the exact date when workers have become unemployed. RDD is based on the idea that workers who had turned 50 just before becoming unemployed are essentially identical to workers who had turned 50 right after becoming unemployed. Hence, the latter can be used as a control group to estimate the effect of interest. Crossing the age threshold can be regarded as a locally valid instrument for maximum UI benefit duration.

GENERAL SETUP Interest in effect of some intervention on some outcome Y. Institutional rules imply that treatment probability jumps at cut-off value x of some quasi-continuous covariate x. x is called the assignment, running or forcing variable. Sharp RDD: Cut-off is strictly enforced and everyone at one side of the cut-off is subject to the intervention and everyone on the other side is not. Fuzzy RDD: There are persons subject to the intervention on both sides of the cut-off but the probability of being subject to the intervention jumps at the cut-off.

GENERAL SETUP

SHARP RDD D i = D(X i ) = 1(X i x) P r(d i = 1 X i < x) = 0 P r(d i = 1 X i x) = 1 Note: no overlap in X i (no common support) between treated and nontreated.

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Local continuity (LC): E[Y 0,i X i = x] and E[Y 1,i X i = x] are continuous in x at x

SUTVA and local continuity

IDENTIFICATION E[Y 1,i Y 0,i X i = x] = lim x x E[Y i X i = x] lim x x E[Y i X i = x] (1) Identifies effect at the threshold x: if effect not constant, only for this population (but highly policy relevant).

What if the forcing variable is discrete? Need to choose a functional form for the relationship between the treatment variable and the outcomes of interest.

What if the forcing variable is discrete? Specification errors can lead to biased results.

PARAMETRIC ESTIMATION Without covariates: Y i = α + θd i + U i (2) Allowing for direct effect of assignment variable on Y i : Y i = α + θd i + β 0 (X i x) + β 1 D i (X i x) + U i (3) Include higher order polynomials of (X i x) to relax functional form assumption: P P Y i = α + θd i + β 0,p (X i x) p + β 1,p D i (X i x) p + U i (4) p=1 p=1 Include other covariates X to increase precision: K Y i = α + θd i + β 0 (X i x) + β 1 D i (X i x) + β 2,k Xk,i + U i (5) k=1 θ is the parameter of interest.

PARAMETRIC ESTIMATION If RDD is valid, estimating (2) using observations in a very small neighborhood around x is sufficient. If observations further away from x are used, controlling for the direct effects of the assignment variable is crucial to avoid bias. If all observations are used, global continuity is assumed and using the correct functional form for the direct effect of the assignment variable is crucial. Controlling for covariates that are correlated with the potential outcomes may improve precision because residuals become smaller. Choosing the caliper around the cutoff is a tradeoff between efficiency (using more observations to increase precision) and consistency (getting the functional form of the direct effect right).

NONPARAMETRIC ESTIMATION Standard kernel using all observations: N i=1 K h(x i x)y i D N i i=1 N i=1 K K h(x i x)y i (1 D i ) h(x i x)d N i i=1 K h(x i x)(1 D i ) (6) Using only observations in neighborhood of x: i {i: x h X i x+h} Y id i i {i: x h X i {i: x h X i x+h} D i x+h} Y i(1 D i ) i i {i: x h X i x+h} (1 D i) (7) But convergence rates can be bad at boundary x.

NONPARAMETRIC ESTIMATION Local linear regression using all observations: N min K h (X i x)[y i α θd i β 0 (X i x) β 1 D i (X i x)] 2 (8) i=1 Using only observations in neighborhood of x: N min 1( x h X i x + h)[y i α θd i β 0 (X i x) β 1 D i (X i x)] 2 (9) i=1

FUZZY RDD lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) 0 < P r(d i = 1 X i = x) < 1 Note: overlap in X i (common support) between treated and nontreated.

ASSUMPTIONS A1 Stable unit treatment value assumption (SUTVA): Y i = D i Y 1,i + (1 D i)y 0,i A2 Local continuity (LC): E[Y 0,i X i = x] and E[Y 1,i X i = x] are continuous in x at x A3 Local monotonicity (LM): lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) i and > for some i or lim x x P r(d i = 1 X i = x) lim x x P r(d i = 1 X i = x) i and > for some i This can be regarded as a local IV.

FUZZY RDD

FUZZY RDD Never-takers (τ = n): P r(d i = 1 X i < x) = P r(d i = 1 X i x) = 0 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = 0 Always-takers (τ = a): P r(d i = 1 X i < x) = P r(d i = 1 X i x) = 1 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = 0 Compliers (τ = c): P r(d i = 1 X i < x) = 0 < P r(d i = 1 X i x) = 1 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = E[Y1,i Y 0,i X i = x] Defiers (τ = d) ruled out by local monotonicity: P r(d i = 1 X i < x) = 1 > P r(d i = 1 X i x) = 0 lim x x E(Y i X i = x) lim x x E(Y i X i = x) = E[Y0,i Y 1,i X i = x]

IDENTIFICATION lim E[Y i X i = x] lim E[Y i X i = x] (10) x x x x = 0 P r(τ = n X i = x) + 0 P r(τ = a X i = x) +E[Y 1,i Y 0,i X i = x, τ = c] P r(τ = c X i = x) +E[Y 0,i Y 1,i X i = x, τ = d] 0 = E[Y 1,i Y 0,i X i = x, τ = c] P r(τ = c X i = x)

IDENTIFICATION E[Y 1,i Y 0,i X i = x, τ = c] (11) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] P r(τ = c X i = x) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] lim x x E[D i X i = x] lim x x E[D i X i = x] Identifies effect for compliers at the threshold x: if effect not constant, only for this population (but highly policy relevant).

IDENTIFICATION Let Z i 1(X i x). E[Y 1,i Y 0,i X i = x, τ = c] (12) = lim x x E[Y i X i = x] lim x x E[Y i X i = x] lim x x E[D i X i = x] lim x x E[D i X i = x] = lim x x E[Y i X i = x, Z i = 1] lim x x E[Y i X i = x, Z i = 0] lim x x E[D i X i = x, Z i = 1] lim x x E[D i X i = x, Z i = 0] This looks very much like LATE with the threshold Z i 1(X i x) as instrument. But the instrument Z i is only valid locally at the threshold x.

PARAMETRIC ESTIMATION (2SLS) First stage: E(D i X i = x) = P r(d i = 1 X i = x) = f(x i x) + γ1(x i x) (13) Second stage: Y i = α + θp r(d i = 1 X i = x) + g(x i x) + U i (14) which is estimated by replacing P r(d i = 1 X i = x) with P r(d i = 1 X i = x) obtained from the first stage. This is essentially IV with the threshold Z i 1(X i x) as instrument.

PARAMETRIC ESTIMATION (2SLS) if sample restricted, local effect estimated if sample not restricted, all data points (even far from x) used assumes global continuity and global monotonicity add functions of (X i x) (e.g. higher order polynomials) and covariates for same reasons as before

NONPARAMETRIC ESTIMATION Standard kernel using all observations: Ni=1 K h (X i x)y i Z i Ni=1 K h (X i x)z i Ni=1 K h (X i x)d i Z i Ni=1 K h (X i x)z i Ni=1 K h (X i x)y i (1 Z i ) Ni=1 K h (X i x)(1 Z i ) Ni=1 K h (X i x)d i (1 Z i ) Ni=1 K h (X i x)(1 Z i ) (15) Using only observations in neighborhood of x: i {i: x h X i x+h} Y iz i i {i: x h X i x+h} Z i i {i: x h X i x+h} D iz i i {i: x h X i x+h} Z i i {i: x h X i x+h} Y i(1 Z i ) i {i: x h X i x+h} (1 Z i) i {i: x h X i x+h} D i(1 Z i ) i {i: x h X i x+h} (1 Z i) (16) But convergence rates can be bad at boundary x.

NONPARAMETRIC ESTIMATION Local linear regression using all observations: min N i=1 K h(x i x)[y i β 0 β 1 Z i β 2 (X i x) β 3 Z i (X i x)] 2 min N i=1 K h(x i x)[d i α 0 α 1 Z i α 2 (X i x) α 3 Z i (X i x)] 2 Using only observations in neighborhood of x: min N i=1 1( x h X i x + h)[y i β 0 β 1 Z i β 2 (X i x) β 3 Z i (X i x)] 2 min N i=1 1( x h X i x + h)[d i α 0 α 1 Z i α 2 (X i x) α 3 Z i (X i x)] 2

NONPARAMETRIC ESTIMATION add more flexible specifications like higher order polynomials include other covariates restrict sample around x vary bandwidth or try alternative estimators which are more appropriate at boundary points

Lalive, R. (2008): How Do Extended Benefits Affect Unemployment Duration? A Regression Discontinuity Approach, Journal of Econometrics, 142(2), 785-806. Austrian UI since August 1989: up to age 39: 30 weeks if employed 3 out of past 5 years age 40-49: 39 weeks if employed 6 out of past 10 years from age 50: 52 weeks (1 year) if employed 9 out of past 15 years July 1988-August 1993: 209 weeks (4 years) from age 50 if employed 15 out of past 25 years resident of selected region for at least 6 months new unemployment spell after June 1988 or ongoing spell in June 1988

Interactions with other policies: statutory retirement age: 60/65 for women/men early retirement age: 55/60 if worked for at least 35 years special income support: - age 54/59 and employed 15 out of past 25 years - for one year - min(1.25*ub, pension benefit) from age 50 women are covered until early retirement

Sample definition entries into unemployment from non-steel sector 1/1986-12/1987 (pre REBP), 8/1989-7/1991 (REBP) ratio of actual to potential work experience since 1972 of at least 0.7 to ensure eligibility age 46-53 at beginning of unemployment no farther than 70 min car drive from border Vienna excluded social insurance records full population exact date of birth and begining of unemployment observed

Discontinuity at age 50: men

Discontinuity at region: men

Validity of RDD: men

Discontinuity at age 50: women

Discontinuity at region: women

Validity of RDD: women

Sharp RDD (1) Baseline: (2) Linear regression: Y i = α 0 + α 1 D i + υ i Y i = α 0 + α 1 D i + β 0 (S i S 0 ) + β 1 D i (S i S 0 ) + ε i (4) Local linear regression: min α 0,α 1,β 0,β 1 i=1 N [Y i α 0 α 1 D i β 0 (S i S 0 ) β 1 D i (S i S 0 )] 2 K h (S i S 0 ) where K h ( ) is the Epanechnikov kernel

Sharp RDD Variants: (3) quadratic and cubic terms in (S i S 0 ) (5) control for pre-reform differences (BD-RDD): include pre-reform observations and estimate fully interacted model with period indicator (6) include additional covariates

Results: men

Results: women

A RECIPE existence of discontinuity: plot D i against X i descriptive evidence of effect: plot Y i against X i no sorting in X i : plot density of X i no sorting in X i : test for discontinuities in density no sorting in covariates: plot covariates against X i no sorting in covariates: covariates as outcome no sorting in covariates: test for discontinuities repeat estimation in period without treatment test for jumps at non-discontinuity points

A RECIPE present both paramteric and non-parametric estimates with and without different polynomials of X i x with and without other covariates vary sample around x comparisons to estimates based on unconfoundedness for fuzzy RDD make sure no other policies use same threshold if there are other policies, a combined RDD and DiD might be an option if assigment variable is not continuous: need parametric function to get rid of direct effect

SUMMARY: Exploit institutional features for credible identification. Usually much more credible than standard IV (and other strategies). Biggest threats to validity are sorting and other policies at same threshold. Large toolkit to assess internal validity empirically.

RECOMMENDED READINGS (other than surveys): Imbens, G. and T. Lemieuz (2008). Regression Discontinuity Designs: A Guide to Practice, Journal of Econometrics, 142(2), 615-635. Lee, D.S. and D. Card (2008). Regression Discontinuity Inference with Specification Error, Journal of Econometrics, 142(2), 655-674. McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test, Journal of Econometrics, 142(2), 698-714.

GOOD APPLICATIONS: all applications in the Journal of Econometrics special issue on RDD 142(2), 2008 Lee, D.S. and J. McCrary (2009). The Deterrence Effect of Prison: Dynamic Theory and Evidence, Working Paper, Princeton University, Department of Economics, Center for Economic Policy Studies. Gormley, W.T., T. Gayer, D. Phillips, and B. Dawson (2005). The Effects of Universal Pre-K on Cognitive Development, Developmental Psychology, 41(6), 872-884.