Structural Equation Modeling

Transcription

1 Structural Equation Modeling Kosuke Imai Princeton University POL572 Quantitative Analysis II Spring 2016 Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

2 Causal Mediation Analysis Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

3 Quantitative Research and Causal Mechanisms Causal inference is a central goal of scientific research Scientists care about causal mechanisms, not just causal effects Randomized experiments often only determine whether the treatment causes changes in the outcome Not how and why the treatment affects the outcome Common criticism of experiments and statistics: black box view of causality Qualitative research uses process tracing Question: How can quantitative research be used to identify causal mechanisms? Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

4 Direct and Indirect Effects Causal mediation analysis: Mediator, M Indrect (Mediation) Effect Treatment, T Outcome, Y Direct Effect Direct and indirect effects; intermediate and intervening variables READING: Imai, Keele, Tingley, and Yamamoto (2012). APSR Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

5 Causal Mediation Analysis in American Politics Media framing experiment in Nelson et al. (APSR, 1998) Path analysis, structural equation modeling Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

6 Causal Mediation Analysis in Comparative Politics Resource curse thesis Authoritarian government civil war Natural resources Slow growth Causes of civil war: Fearon and Laitin (APSR, 2003) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

7 Causal Mediation Analysis in International Relations The literature on international regimes and institutions Krasner (International Organization, 1982) Power and interests are mediated by regimes Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

8 Potential Outcomes Framework for Mediation Binary treatment: T i Pre-treatment covariates: X i Potential mediators: M i (t) Observed mediator: M i = M i (T i ) Potential outcomes: Y i (t, m) Observed outcome: Y i = Y i (T i, M i (T i )) Again, only one potential outcome can be observed per unit Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

9 Causal Mediation Effects Total causal effect: τ i Y i (1, M i (1)) Y i (0, M i (0)) Causal mediation (Indirect) effects: δ i (t) Y i (t, M i (1)) Y i (t, M i (0)) Causal effect of the treatment-induced change in M i on Y i Change the mediator from M i (0) to M i (1) while holding the treatment constant at t Represents the mechanism through M i Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

10 Total Effect = Indirect Effect + Direct Effect Direct effects: ζ i (t) Y i (1, M i (t)) Y i (0, M i (t)) Causal effect of T i on Y i, holding mediator constant at its potential value that would be realized when T i = t Change the treatment from 0 to 1 while holding the mediator constant at M i (t) Represents all mechanisms other than through M i Total effect = mediation (indirect) effect + direct effect: τ i = δ i (t) + ζ i (1 t) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

11 What Does the Observed Data Tell Us? Quantity of Interest: Average causal mediation effects (ACME) δ(t) E(δ i (t)) = E{Y i (t, M i (1)) Y i (t, M i (0))} Average direct effects ( ζ(t)) are defined similarly Y i (t, M i (t)) is observed but Y i (t, M i (t )) can never be observed We have an identification problem = Need additional assumptions to make progress Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

12 Identification under Sequential Ignorability Proposed identification assumption: Sequential Ignorability (SI) {Y i (t, m), M i (t)} T i X i = x, (1) Y i (t, m) M i (t) T i = t, X i = x (2) (1) is guaranteed to hold in a standard experiment (2) does not hold unless X i includes all confounders Like any ignorability assumption, the assumption is very strong 1 No unmeasured pre-treatment confounder 2 No measured and unmeasured post-treatment confounder Under SI, ACME is nonparametrically identified: E(Y i M i, T i = t, X i ) {dp(m i T i = 1, X i ) dp(m i T i = 0, X i )} dp(x i ) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

13 The Linear Structural Equation Model Y i = α 1 + β 1 T i + Xi ξ 1 + ɛ 1i, (1) M i = α 2 + β 2 T i + Xi ξ 2 + ɛ 2i, (2) Y i = α 3 + β 3 T i + γm i + Xi ξ 3 + ɛ 3i. (3) Equation (1) is redundant given equations (2) and (3): α 1 = α 3 + α 2 γ, β 1 = β 2 γ + β 3, ξ 1 = ξ 2 γ + ξ 3, ɛ 1i = γɛ 2i + ɛ 3i Potential outcomes notation: M i (T i ) = α 2 + β 2 T i + Xi ξ 2 + ɛ 2i, Y i (T i, M i (T i )) = α 3 + β 3 T i + γm i (T i ) + Xi ξ 3 + ɛ 3i. Under SI and linearity: 1 E(ɛ 2i T i, X i ) = E(ɛ 2i X i ) = f 2 (X i ) = X i ξ 2 2 E(ɛ 3i T i, M i, X i ) = E(ɛ 3i X i ) = f 3 (X i ) = X i ξ 3 Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

14 Estimation of Average Causal Mediation Effects Redefine: 1 ɛ 2i = Xi ξ2 + ɛ 2i where E(ɛ 2i X i ) = 0 2 ɛ 3i = Xi ξ3 + ɛ 3i where E(ɛ 3i X i ) = 0 Then, SI implies Cor(ɛ 2i, ɛ 3i ) = 0 Non-zero correlation = M i is correlated with ɛ 3i given T i and X i Under the LSEM and SI, 1 δ(1) = δ(0) = β2 γ 2 ζ(1) = ζ(0) = β3 3 τ = β 2 γ + β 3 Exact variance: V( ˆβ 2ˆγ) = β 2 2 V(ˆγ) + γ2 V( ˆβ 2 ) + V(ˆγ)V( ˆβ 2 ) Asymptotic variance: V( ˆβ 2ˆγ) β 2 2 V(ˆγ) + γ2 V( ˆβ 2 ) No interaction assumption: δ(1) = δ(0) Can be relaxed via an interaction term in equation (3): T i M i Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

15 The Delta Method Taylor s Theorem: If f (x) is n + 1 times differentiable and f (n) is continuous, then for each x there exists c (x 0, x) such that, f (x) = f (x 0 ) + n f (k) (x 0 ) (x x 0 ) k + f (n+1) (c) k! (n + 1)! (x x 0) (n+1) k=1 }{{}}{{} Lagrange remainder Taylor series Univariate: Suppose that d n(x n θ) N (0, σ 2 ). Then, d n(f (Xn ) f (θ)) N (0, σ 2 [f (1) (θ)] 2 ) Multivariate: Suppose that n(x n θ) n(f (Xn ) f (θ)) d N (0, [f (1) (θ)] Σf (1) (θ)) d N (0, Σ). Then, Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

16 General Estimation Algorithm Based on the nonparametric identification result under SI: 1 Model outcome and mediator Outcome model: p(y i T i, M i, X i ) Mediator model: p(m i T i, X i ) These models can be of any form (linear or nonlinear, semi- or nonparametric, with or without interactions) 2 Predict mediator for both treatment values (M i (1), M i (0)) 3 Predict outcome by first setting T i = 1 and M i = M i (0), and then T i = 1 and M i = M i (1) 4 Compute the average difference between two outcomes to obtain a consistent estimate of ACME 5 Monte Carlo simulation or bootstrap to estimate uncertainty (more on these methods later) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

17 Need for Sensitivity Analysis The sequential ignorability assumption is often too strong Need to assess the robustness of findings via sensitivity analysis Question: How large a departure from the key assumption must occur for the conclusions to no longer hold? Sensitivity analysis by assuming {Y i (t, m), M i (t)} T i X i = x but not Y i (t, m) M i T i = t, X i = x Possible existence of unobserved pre-treatment confounder Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

18 Sensitivity Analysis Sensitivity parameter: ρ Corr(ɛ 2i, ɛ 3i ) Sequential ignorability implies ρ = 0 Set ρ to different values and see how mediation effects change Identification with a given error correlation δ(0) = δ(1) = β 2 σ 12 σ2 2 ρ ( ) 1 σ 2 1 ρ 2 σ1 2 σ2 12 σ2 2, where σ 2 j V(ɛ ij ) for j = 1, 2 and σ 12 Cov(ɛ 1i, ɛ 2i ). When do my results go away completely? δ(t) = 0 if and only if ρ = Corr(ɛ 1i, ɛ 2i ) (easy to compute!) Interpretation of ρ can be based on R 2 statistics Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

19 An Example: Nelson et al. (1998) APSR Average Mediation Effect: δ Sensitivity Parameter: ρ Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

20 Beyond Sequential Ignorability Sensitivity analysis may be unsatisfactory Existence of post-treatment confounders: multiple mediators What if we get rid of the assumption altogether? Under a standard design, even the sign of ACME is unidentified Can we do any better? Develop alternative designs for more credible and powerful inference Designs feasible when the mediator can be directly or indirectly manipulated Statistical methods for handling multiple mediators Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

21 Instrumental Variables Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

22 Instrumental Variables The Model: Y = Xβ + ɛ with E(ɛ i ) = 0 and V(ɛ i ) = σ 2 Endogeneity: X ɛ where X is n K Instruments: Z ɛ where Z is n L Rank condition: Z X and Z Z have full rank Identification 1 K = L: just-identified 2 K < L: over-identified 3 K > L: under-identified The IV estimator: ˆβ IV (X Z(Z Z) 1 Z X) 1 X Z(Z Z) 1 Z Y Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

23 Geometry of Instrumental Variables Projection matrix (onto S(Z)): P Z = Z(Z Z) 1 Z Purge endogeneity: X = P Z X Since P Z = P Z and P ZP Z = P Z, we have ˆβ IV = ( X X) 1 X Y Two stage least squares: 1 Regress X on Z and obtain the fitted values X 2 Regress Y on X We do not assume the linearity of X in Z Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

24 Asymptotic Inference A familiar (by now!) trick: ˆβ IV β = (X Z(Z Z) 1 Z X) 1 X Z(Z Z) 1 Z ɛ ( ) ( ) 1 ( 1 n = X i Zi 1 n Z i Zi 1 n n n ( 1 n i=1 n i=1 X i Z i ) ( 1 n i=1 n i=1 Z i Z i p Thus, ˆβ IV β Under the homoskedasticity, V(ɛ Z) = σ 2 I n : ) 1 ( 1 n n i=1 Z i X i ) n Z i ɛ i i=1 ) d n( ˆβ IV β) N (0, σ 2 [E(X i Zi ){E(Z i Zi )} 1 E(Z i Xi )] 1 ) 1 Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

25 Residuals and Robust Standard Error ˆɛ = Y X ˆβ IV and not ˆɛ Y X ˆβ IV Under homoskedasticity: ˆσ 2 ˆɛ 2 n K p σ 2 V( ˆβ IV ) = ˆσ 2 {X Z(Z Z) 1 Z X} 1 = ˆσ 2 ( X X) 1 Sandwich heteroskedasticity consistent estimator: bread = {X Z(Z Z i ) 1 Z X} 1 X Z(Z Z) 1 ( ) n meat = Z diag(ˆɛ 2 i )Z = ˆɛ 2 i Z izi i=1 bread meat bread = ( X X) 1 X diag(ˆɛ 2 i ) X( X X) 1 Robust standard error for clustering and auto-correlation Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

26 Relationship with Linear Structural Equation Modeling The model: T i = α 2 + β 2 Z i + Xi ξ 2 + ɛ i2, Y i = α 3 + β }{{} 3 Z i + γt i + Xi ξ 3 + ɛ i3. =0 Exclusion restriction (no direct effect): β 3 = 0 ɛ i2 ɛ i3 X i = x for some x Ignorability is assumed for Z i Sequential ignorability is not assumed for T i What is the causal interpretation of the IV method? Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

27 Partial Compliance in Randomized Experiments Unable to force all experimental subjects to take the (randomly) assigned treatment/control Intention-to-Treat (ITT) effect treatment effect Selection bias: self-selection into the treatment/control groups Political information bias: effects of campaign on voting behavior Ability bias: effects of education on wages Healthy-user bias: effects of exercises on blood pressure Encouragement design: randomize the encouragement to receive the treatment rather than the receipt of the treatment itself Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

28 Potential Outcomes Notation Randomized encouragement: Z i {0, 1} Potential treatment variables: (T i (1), T i (0)) 1 T i (z) = 1: would receive the treatment if Z i = z 2 T i (z) = 0: would not receive the treatment if Z i = z Observed treatment receipt indicator: T i = T i (Z i ) Observed and potential outcomes: Y i = Y i (Z i, T i (Z i )) Can be written as Y i = Y i (Z i ) No interference assumption for T i (Z i ) and Y i (Z i, T i ) Randomization of encouragement: But (Y i (1), Y i (0)) T i Z i = z (Y i (1), Y i (0), T i (1), T i (0)) Z i Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

29 Principal Stratification Framework Four principal strata (latent types): compliers (T i (1), T i (0)) = (1, 0), always takers (T i (1), T i (0)) = (1, 1), non-compliers never takers (T i (1), T i (0)) = (0, 0), defiers (T i (1), T i (0)) = (0, 1) Observed and principal strata: Z i = 1 Z i = 0 T i = 1 Complier/Always-taker Defier/Always-taker T i = 0 Defier/Never-taker Complier/Never-taker Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

30 Instrumental Variables Randomized encouragement as an instrument for the treatment Two additional assumptions 1 Monotonicity: No defiers T i (1) T i (0) for all i. 2 Exclusion restriction: Instrument (encouragement) affects outcome only through treatment Y i (1, t) = Y i (0, t) for t = 0, 1 Zero ITT effect for always-takers and never-takers ITT effect decomposition: ITT = ITT c Pr(compliers) + ITT a Pr(always takers) +ITT n Pr(never takers) = ITT c Pr(compliers) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

31 IV Estimand and Interpretation IV estimand: ITT c = ITT Pr(compliers) = E(Y i Z i = 1) E(Y i Z i = 0) E(T i Z i = 1) E(T i Z i = 0) = Cov(Y i, Z i ) Cov(T i, Z i ) ITT c = Complier Average Treatment Effect (CATE) CATE ATE unless ATE for noncompliers equals CATE Different encouragement (instrument) yields different compliers Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

32 Asymptotic Inference Wald estimator: ÎV W Ĉov(Y i,z i ) = ÎTT Y Ĉov(T i,z i ) ÎTT T p Consistency: ÎV W CATE = ITT c Asymptotic variance: V(ÎV W ) 1 { ITT 4 ITT 2 T V(ÎTT Y ) + ITT 2 Y V(ÎTT T ) T } 2 ITT Y ITT T Cov(ÎTT Y, ÎTT T ). Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

33 Violations of IV Assumptions Violation of exclusion restriction: Large sample bias = ITT noncomplier Pr(noncomplier) Pr(complier) Weak encouragement (instruments) Direct effects of encouragement; failure of randomization, alternative causal paths Violation of monotonicity: Large sample bias = {CATE + ITT defier} Pr(defier) Pr(complier) Pr(defier) Proportion of defiers Heterogeneity of causal effects Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

34 An Example: Testing Habitual Voting Randomized encouragement to vote in an election Treatment: turnout in the election Outcome: turnout in the next election Monotonicity: Being contacted by a canvasser would never discourage anyone from voting Exclusion restriction: being contacted by a canvasser in this election has no effect on turnout in the next election other than through turnout in this election CATE: Habitual voting for those who would vote if and only if they are contacted by a canvasser in this election Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

35 Multi-valued Treatment Two stage least squares regression: T i = α 2 + β 2 Z i + η i, Y i = α 3 + γt i + ɛ i. Binary encouragement and binary treatment, ˆγ = ĈATE (no covariate) ˆγ p CATE (with covariates) Binary encouragement multi-valued treatment Monotonicity: T i (1) T i (0) Exclusion restriction: Y i (1, t) = Y i (0, t) for each t = 0, 1,..., K Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

36 Estimator ˆγ TSLS p Cov(Y i, Z i ) Cov(T i, Z i ) = E(Y i(1) Y i (0)) E(T i (1) T i (0)) K K ( Yi (1) Y i (0) ) = w jk E T i (1) = j, T i (0) = k j k k=0 j=k+1 where w jk is the weight, which sums up to one, defined as, w jk = (j k) Pr(T i (1) = j, T i (0) = k) K k =0 K j =k +1 (j k ) Pr(T i (1) = j, T i (0) = k ). Easy interpretation under the constant additive effect assumption for every complier type Assume encouragement induces at most only one additional dose Then, w k = Pr(T i (1) = k, T i (0) = k 1) Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

37 Fuzzy Regression Discontinuity Design Sharp regression discontinuity design: T i = 1{X i c} What happens if we have noncompliance? Forcing variable as an instrument: Z i = 1{X i c} Potential outcomes: T i (z) and Y i (z, t) Monotonicity: T i (1) T i (0) Exclusion restriction: Y i (0, t) = Y i (1, t) E(T i (z) X i = x) and E(Y i (z, T i (z)) X i = x) are continuous in x Estimand: E(Y i (1, T i (1)) Y i (0, T i (0)) Complier, X i = c) Estimator: lim x c E(Y i X i = x) lim x c E(Y i X i = x) lim x c E(T i X i = x) lim x c E(T i X i = x) Disadvantage: external validity Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

38 An Example: Class Size Effect (Angrist and Lavy) Effect of class-size on student test scores Maimonides Rule: Maximum class size = 40 Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39

39 Concluding Remarks Causal mediation analysis: exploring causal mechanisms Even in randomized experiments, an additional (untestable) assumption is required Importance of sensitivity analysis Instrumental variables in randomized experiments: dealing with partial compliance Additional (untestable) assumptions are required ITT vs. CATE Instrumental variables in observational studies: dealing with selection bias Validity of instrumental variables requires rigorous justification Tradeoff between internal and external validity Kosuke Imai (Princeton) Structural Equation Modeling POL572 Spring / 39