Binary Outcome Models: Endogeneity and Panel Data

Transcription

1 Binary Outcome Models: Endogeneity and Panel Data ECMT 676 (Econometric II) Lecture Notes TAMU April 14, 2014 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

2 Topics Issues in binary response models: Endogeneity I I Continuous endogenous x F F Control function approach IV probit approach Binary endogenous x Panel data I I I RE CRE FE ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

3 Binary response model with endogeneity Endogeneity arises naturally in the latent variable model LPM (linear probability model) with 2SLS is a handy solution for binary response with endogeneity. But it does not allow for individual-speci c marginal e ects We will deal with probit/logit with endogeneity Why endogeneity causes problem in binary model? not as transparent as in the linear model ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

4 The binary outcome model: y i = I fyi > 0g yi = xi 0 i Now we allow Ex i u i 6= 0 by assuming u i jx i N(q(x i ), 1). Then P(y i = 1jx i ) = P(u i < x 0 i βjx i ) = P(u i q(x i ) < x 0 i β q(x i )jx i ) = Φ(x 0 i β q(x i )) Probit of y i on x i is not consistent. We will see general endogeneity (σ uv 6= 0) introduces not only non-constant mean but also non-unity variance. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

5 When the endogenous variable is continuous A control function (CF) approach The part is based on Blundell and Powell (2004 REStud) and also Wooldridge (15.7.2). Unanswered question: why not traditional 2 stages? The model y 1i = I fy1i > 0g y1i = xi u i = z1i 0 β + αy 2i + u i Now suppose part of x i (called y 2i, assuming as a scalar) is endogenous: x i = (z 0 1i, y 2i ) 0. The vector of IVs z i = (z 0 1i, z0 2i )0. The reduced form (RF) y 2i = z 0 i δ + v i = z 0 1i δ 1 + z 0 2i δ 2 + v i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

6 The parameters (α, β) index the average structural function (ASF) (or, the response probability). Again, this is de ned by xing all observed explanatory variables and integrating out the unobservable, u: ASF (y 2, z 1 ) = E u f1[z1 0 β + αy 2 + u > 0]g = Φ(z1 0 β + αy 2) Thus, α and β are the parameters that appear in the APEs (derivatives of ASF). ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

7 Assume ui Endogeneity comes from σ uv 6= 0. v i 1 jz N 0, σ uv σuv σ 2 v Normality of v is not realistic if y 2i is binary. So we are assuming for now the endogenous variable is continuous. Probit has some advantage here over logit, because we can then assume joint normality ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

8 Rivers and Vuong (1998) proposed a "control function (CF)" approach: "which introduces residuals from the reduced form for the regressors as covariates in the binary response model to account for endogeneity" (BP, 2004). Linear projection u = θv + e where θ = σ uv /σ 2 v and e is also normal. Having the linear projection, we get We will need D(ejz 1, y 2, v). I (y 1 = 1) = I (z1i 0 β + αy 2i + u i > 0) = I (z1i 0 β + αy 2i + θv i + e i > 0) the mean part ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

9 First we have e v = u θv 1 θ = v ρ 2 0 N(0, 0 σ 2 v u v ), given z, where ρ = corr(u, v) = σ uv /σ v. Then e is independent of v : uncorrelatedness of e and v (which also follows from the linear projection), implies independence, since they are jointly normal. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

10 Let D(j) mean conditional distribution D(ejz 1, y 2, v) = D(ejz, v) (since y 2 is generated by z and v) = D(ejz) (since e is independent of v) N(0, 1 ρ 2 ). So standard assumptions hold: conditional normality, homoskedasticity But the identi cation condition fails: Var(e) 6= 1. The outcome equation becomes (CF) y 1i = z 0 1i β + αy 2i + θv i + e i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

11 Then P(y 1 = 1jx, v) = P(y 1 = 1jz 1, y 2, v) = P(z1i 0 β + αy 2i + θv i + e i > 0jz 1, y 2, v) = P(e i > (z1i 0 β + αy 2i + θv i )jz 1, y 2, v) = Φ (z1i 0 β + αy 2i + θv i )/ p 1 ρ 2 (1) v i can be estimated from RF, as bv i A probit of y 1 on z 1, y 2 and bv gives the estimates, called eβ, eα and eθ. So q eβ! β = β/ 1 ρ 2 eα! q α = α/ 1 ρ 2 q eθ! θ = θ/ 1 ρ 2 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

12 Recall we are interested in β and α (especially α). The problem is now how to estimate ρ. Note that q θ = θ/ 1 ρ 2 θ = σ uv /σ 2 v = ρσ v /σ 2 v = ρ/σ v Solve the system for ρ and θ, since σ v and θ can be estimated. So p 1 ρ 2 can be estimated: p q 1 ρ 2 = 1 + θ 2 σ 2 v. So q bβ = eβ/ 1 + eθ 2 bσ 2 v q bα = eα/ 1 + eθ 2 bσ 2 v ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

13 An alternative derivation of (1) By Fact 1 (see below), u i jv i N(θv i, 1 ρ 2 ). This conditional normality is essentially all we need (we don t really need joint normality) So E (y 1i jx i, v i ) = P(u i > xi 0 β 0 jx i, v i ) = 1 P(u i < xi 0 β 0 jx i, v i ) = 1 P( u i θv p i 1 ρ 2 < x i 0β 0 θv p i 1 ρ 2 jx i, v i ) = 1 Φ( θv p i ) 1 ρ 2 xi 0β 0 = Φ( x 0 i β 0 + θv i p 1 ρ 2 ) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

14 Fact 1 This result is also crucial in Kalman lter. If then where u N v mu m v Suu, S vu ujv N(m ujv, S ujv ) S uv S vv m ujv = m u + S uv S 1 vv (v m v ) S ujv = S uu S uv S 1 vv S vu ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

15 Since v i has to be estimated, so CF regression involves a generated regressor. So getting the standard error is hard. An alternative approach: MLE (it is also called IV probit) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

16 Control function approach: in the linear model, it leads to the same estimate as 2SLS Wooldridge, Section 6.2, p. 127 & Problem 5.1 Exercise: show CF=2SLS (using OLS algebra) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

17 When the endogenous variable is continuous IV probit approach IV probit: the key is to obtain f (y 1, y 2 jz) f (y 1, y 2 jz) = f (y 1 jy 2, z) f (y 2 jz) First, f (y 2 jz) = f (zi 0δ + v i jz) N(zi 0δ, σ2 v ) = ϕ((y 2 zi 0δ)/σ v )/σ v Second, to get f (y 1 jy 2, z), essentially P(y 1 = 1jy 2, z), we need the distribution of u given y 2, z, where y1i = z1i 0 β + αy 2i + u i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

18 By Fact 1, given z f (ujy 2, z) = f (ujz, v) N(σ uv σ 2 v v, 1 σ 2 uv σ 2 v ) This can be compared with the exogenous probit, in which case f (ujy 2, z) = f (ujz, v) u?v = f (ujz) N(0, 1) So endogeneity not only introduces non-zero mean but also non-unit variance in the conditional distribution of u. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

19 So P(y 1 = 1jy 2, z) = = P(z1i 0 β + αy 2i + u i > 0) = P(u i > q (z1i 0 β + αy 2i )) = P(u i σ uv σv 2 v/ 1 ρ 2 q > ( z1i 0 β αy 2i σ uv σv 2 v)/ q 1 ρ 2 ) = Φ((z1i 0 β + αy 2i + ρσv 1 v)/ 1 ρ 2 ) q = Φ((z1i 0 β + αy 2i + ρσv 1 (y 2 {z } z 0 δ))/ 1 ρ 2 ) =θ Φ(w) The likelihood for the i th observation: l i = Φ(w) y 1 [1 Φ(w)] 1 y 1 ϕ((y 2 zi 0δ)/σ v )/σ v Log likelihood for the sample L(β, α, ρ, σ v, δ)= n i =0 log l i Maximization over β, α, ρ, σ v, δ. Standard error: the estimated Hessian, or the outer product of the score. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

20 A simple test of exogeneity: ρ = 0. Control function approach: only focuses on f (y 1 jy 2, z), is thus a limited information procedure. CF approach replaces v by bv. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

21 When the endogenous variable is binary Binary exogenous variable needs special treatment because the linear model in the rst stage is not reasonable any more. The model y 1i = I fy1i > 0g y1i = xi u i = z1i 0 β + y 2i 0 α + u i y 2 = I fzi 0 i > 0g Like before, we assume (u, v) is jointly normal given z But here Var(v) = 1 for identi cation Assume, given z ui v i 1 ρ jz i N 0, ρ 1 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

22 The likelihood of observation i : l i = f (y 1i, y 2i jz i ) = f (y 1i jy 2i, z i ) f (y 2i jz i ). The second factor is easy to obtain: f (y 2 jz) = P(y 2 = 1) y 2 [1 P(y 2 = 1)] 1 y 2 The rst factor is one of four cases: = Φ(z 0 δ) y 2 [1 Φ(z 0 δ)] 1 y 2 P(y 1 = 1jy 2 = 1, z) P(y 1 = 0jy 2 = 1, z) [=1-P(y 1 = 1jy 2 = 1, z)] P(y 1 = 1jy 2 = 0, z) P(y 1 = 0jy 2 = 0, z) [=1-P(y 1 = 1jy 2 = 0, z)] Thus MLE is obtained: L(β, α, ρ, δ) = n i =0 log l i We will calculate these four cases now. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

23 Fact 2 If v is standard normal, PDF (vjv > a) ϕ(v)/p(v > a) = ϕ(v)/φ( a) And PDF (vjv < a) ϕ(v)/p(v < a) = ϕ(v)/[1 Φ( a)] (proof?) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

24 When the endogenous variable is binary P(y 1 = 1jy 2 = 1, z) = E (I (y 1 = 1)jy 2 = 1, z) = E fe [I (y 1 = 1)jv, z]jy 2 = 1, zg (smaller conditional set dominates) = E (P(y 1 = 1jv, z)jy 2 = 1, z) q = E (Φ((z1i 0 β + αy 2i + ρv)/ 1 q ρ 2 )j y2 =1,z ) = E (Φ((z1i 0 β + αy 2i + ρv)/ 1 ρ 2 )jv i > zi 0 δ, z) = Z q support of v Φ((z0 1i β + αy 2i + ρv)/ Z by Fact 2 q = where (*) uses a projection. z 0 i δ Φ((z 0 1i β + αy 2i + ρv)/ 1 ρ 2 )f (vjv i > z 0 i δ, z)dv 1 ρ 2 )ϕ(v)/φ(z 0 i δ)dv, ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

25 Similarly P(y 1 = 1jy 2 = 0, z) Z zi 0 = δ q Φ((z1i 0 β + αy 2i + ρv)/ 1 ρ 2 )ϕ(v)/[1 Φ(zi 0 δ)]dv Similarly we can compute P(y 1 = 0jy 2 = 1, z) and P(y 1 = 0jy 2 = 0, z) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

26 EXAMPLE: E ects of Children on Labor Force Participation, LABSUP.txt y 1 = worked, y 2 = morekids (a dummy for having more than two children). Population is women with at least two children. worked = 1[α morekids + β 0 + β 1 nonmomi + β 2 educ +β 3 age + β 4 age 2 + β 5 black + β 6 hispan + u > 0] The binary variable samesex is the IV for morekids. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

27 Binary panel data model Pooled probit/logit Panel model without unobserved e ects: P(y it = 1jx it ) = Φ(x 0 it β) Strict exogeneity (SE): D(y it jx i ) = D(y it jx it ) for t = 1,, T, where x i = (x i1,, x it ) 0. (D means distribution) Stronger than the one in the linear model: in terms of expectation there. One di culty of using MLE (which is necessary for binary models) is y it is not independent over t, which is inconvenient in constructing the likelihood function Conditional independence (CI): y i1, y i2,, y it are independent conditional on x i. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

28 Under these two assumptions: f (y i1, y i2,, y it jx i ) CI = Π T t=1 f (y it jx i ) SE = Π T t=1 f (y it jx it ), where f (y it jx it ) = Φ(x it β) y it [1 Φ(x it β)] 1 y it. Pooled likelihood n T [y it log Φ(xit 0 β) + (1 y it ) log(1 Φ(x 0 i =1 t=1 Maximizer: pooled probit estimator. it β))] Partial likelihood theory says it is still consistent if CI doesn t hold. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

29 Binary panel data model RE model Now we add unobserved e ects Response probability: P(y it = 1jx it, c i ) = Φ(x 0 it β + c i ) SE: D(y it jx i, c i ) = D(y it jx it, c i ) for t = 1,, T. CI: y i1, y i2,, y it are independent conditional on x i, c i y i1, y i2,, y it are unconditionally dependent because of c i. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

30 CI implies: f (y i1, y i2,, y it jx i, c i ) CI = Π T t=1 f (y it jx i, c i ) SE = Π T t=1 f (y it jx it, c i ), where f (y it jx it, c i ) = Φ(x it β + c i ) y it [1 Φ(x it β + c i )] 1 y it. But c i is unobservable, so c i shouldn t appear in the likelihood function Viewing c i as parameters along with β leads to an incidental parameters problem. It means that MLEs bc i and bβ are inconsistent. (unlike in the linear case, in which we have p n consistency) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

31 Hallmark of FE analysis (in nonlinear models): no speci cation of a distribution for c i given x i. RE (random e ects) probit: assuming (The RE assumption) c i jx i N(0, σ 2 c ) Since the rst element of x i is 1, it implies c i N(0, σ 2 c ). Then a conditional ML can be applied to β and σ 2 c as follows. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

32 Dealing with c i : integrate out c i f (y i1, y i2,, y it jx i ) = Z Π T t=1 f (y it jx it, c i )ϕ(c/σ c )/σ c dc. The likelihood function for the entire sample is then straightforward: production over i It is called RE probit estimator. The estimate is also called population averaged. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

33 As shown before, under RE assumption, viewing c i + u it as the composite error in the latent variable model and applying pooled probit will give the attenuation bias This is in contrast to the linear model. Latent variable model yi = xit 0 i + u it xit 0 it y it = I (yi > 0) Var(v it jx it ) = Var(c i + u it jx it ) u it?c = i Var(uit jx it ) + σ 2 c Var(u it jx it, c i ) + σ 2 c = 1 + σ 2 c Response probability: u it?c = i P(y it = 1jx it ) = Φ(x 0 it β/ q1 + σ 2 c ) So bβ p! β/ p 1 + σ 2 c This result also motivates the necessity to specify the distribution of c (in contrast to linear RE model) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

34 Binary panel data model CRE model RE and CI might be too strong. Correlated RE (CRE) probit (Chamberlain s): A relaxation of RE: c i jx i N(x 0 i ξ, σ2 c ) allowing dependence between c i and x i Estimation is straightforward: only needs a modi cation of density of c f (y i1, y i2,, y it jx i ) = Z Π T t=1 f (y it jx it, c i )ϕ((c x 0 i ξ)/σ c )/σ c dc. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

35 Binary panel data model FE logit Panel logit model: where G (x) = e x /(1 + e x ). P(y it = 1jx it, c i ) = G (x 0 it β + c i ), An important advantage of panel logit is that we can obtain p N consistent and asymp. normal estimator of β without any assumptions on D(c i jx i ), or achieving xed e ects (FE) estimation. Assumptions: SE, CI, and that each element of x it is time-varying. De ne N i = T t=1 y it. It turns out that D(y i jx i, c i, N i ) does not depend on c i. The functional form of logit helps. We can t do this in probit. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

36 Consider T = 2. Then N i 2 f0, 1, 2g. Note that neither of which is informative for β. So we only consider N i = 1. P(y i1 = 1jx i, c i, N i = 0) = 0 P(y i2 = 1jx i, c i, N i = 0) = 0 P(y i1 = 1jx i, c i, N i = 2) = 1 P(y i2 = 1jx i, c i, N i = 2) = 1 As we will show, consistent estimation of β can be obtained by a standard logit of y i2 on x i2 x i1 using the observations for which N i = 1. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

37 where P(y i2 = 1jx i, c i, N i = 1) = P(y i2 = 1, N i = 1jx i, c i ) P(N i = 1jx i, c i ) Top = P(y i2 = 1, y i1 = 0jx i, c i ) CI = P(y i2 = 1jx i, c i )P(y i1 = 0jx i, c i ) SE = G (x 0 i2 β + c i )[1 G (x 0 i1 β + c i )] = e x 0 i2 β+c i (1 + e x 0 i2 β+c i )(1 + e x 0 i1 β+c i ), Bottom = P(y i2 = 1, y i1 = 0jx i, c i ) + P(y i2 = 0, y i1 = 1jx i, c i ) = G (xi2 0 β + c i )[1 G (xi1 0 β + c i )] + [1 G (xi2 0 β + c i )]G (xi1 0 β + c i ) e x i1 0 β+c i + e x i2 0 β+c i = (1 + e x 0 i2 β+c i )(1 + e x 0 i1 β+c i ). ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

38 So P(y i2 = 1jx i, c i, N i = 1) = = G ((x i2 x i1 ) 0 β) e x 0 i2 β+c i e x 0 i1 β+c i + e x 0 i2 β+c i = e(x i2 x i1 ) 0 β 1 + e (x i2 x i1 ) 0 β c i is canceled! On the other hand P(y i1 = 1jx i, c i, N i = 1) = P(y i2 = 0jx i, c i, N i = 1) = 1 G ((x i2 x i1 ) 0 β) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

39 f (y i1, y i2 jx i, c i, N i = 1) = P(y i1 = 1jx i, c i, N i = 1) or P(y i2 = 1jx i, c i, N i = 1) = P(y i1 = 1jx i, c i, N i = 1) y i1 P(y i2 = 1jx i, c i, N i = 1) y i2 So the conditional log likelihood function for observation i is y i1 =1 y i2 l i (β) = I (N i = 1)[y i1 log(1 G ((x i2 x i1 ) 0 β)) +y i2 log G ((x i2 x i1 ) 0 β)] = I (Ni = 1)[(1 y i2 ) log(1 G ((x i2 x i1 ) 0 β)) +y i2 log G ((x i2 x i1 ) 0 β)] We select out the observations for which N i = 1. Computationally, it is just a standard logit of y i2 on x i2 observations for which N i = 1. x i1 using the ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40

40 It is called conditional MLE (CMLE), as it is di erent for a full MLE. To get a full MLE: P(y i1, y i2 jx i, c i ) = P(y i1, y i2 jx i, c i, N i = 0) P(N {z } i = 0jx i, c i ) deterministic +P(y i1, y i2 jx i, c i, N i = 1) P(N {z } i = 1jx i, c i ) CMLE +P(y i1, y i2 jx i, c i, N i = 2) P(N {z } i = 2jx i, c i ). deterministic Here we used the rule of total probability: P(A)=P(AjB 1 )P(B 1 )+P(AjB 2 )P(B 2 ) if B 1 and B 2 forms a partition of the sample space. A side: this rule was used to show the selection bias, which mistakenly assumes P(A)=P(AjB 1 ). CMLE doesn t consider P(N i = 0jx i, c i ), P(N i = 1jx i, c i ) or P(N i = 2jx i, c i ) which may depend on β. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, / 40