Binary Outcome Models: Endogeneity and Panel Data



Similar documents
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Chapter 3: The Multiple Linear Regression Model

Standard errors of marginal effects in the heteroskedastic probit model

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Chapter 4: Statistical Hypothesis Testing

Chapter 2. Dynamic panel data models

Econometrics Simple Linear Regression

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Solución del Examen Tipo: 1

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

1 Another method of estimation: least squares

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

The Bivariate Normal Distribution

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Maximum Likelihood Estimation

Correlated Random Effects Panel Data Models

CAPM, Arbitrage, and Linear Factor Models

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

2. Linear regression with multiple regressors

Panel Data: Linear Models

Classification Problems

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Lecture 14: GLM Estimation and Logistic Regression

Lecture 3: Linear methods for classification

On Marginal Effects in Semiparametric Censored Regression Models

1 Sufficient statistics

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Lecture 19: Conditional Logistic Regression

Models for Longitudinal and Clustered Data

Statistical Machine Learning

Simple Linear Regression Inference

Markov Chain Monte Carlo Simulation Made Simple

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Multiple Choice Models II

Sales forecasting # 1

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Employer-Provided Health Insurance and Labor Supply of Married Women

Note on the EM Algorithm in Linear Regression Model

Quadratic forms Cochran s theorem, degrees of freedom, and all that

A General Approach to Variance Estimation under Imputation for Missing Survey Data

Panel Data Econometrics

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Topic 5: Stochastic Growth and Real Business Cycles

1 Prior Probability and Posterior Probability

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

PS 271B: Quantitative Methods II. Lecture Notes

Multivariate Logistic Regression

Econometric Methods for Panel Data

LOGIT AND PROBIT ANALYSIS

Imputation of missing data under missing not at random assumption & sensitivity analysis

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

The Engle-Granger representation theorem

Regression with a Binary Dependent Variable

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Multilevel Models for Longitudinal Data. Fiona Steele

Univariate Time Series Analysis; ARIMA Models

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Multi-variable Calculus and Optimization

Monte Carlo-based statistical methods (MASM11/FMS091)

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Analyzing Structural Equation Models With Missing Data

SYSTEMS OF REGRESSION EQUATIONS

Lecture 3: Differences-in-Differences

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Lecture 8: More Continuous Random Variables

Reject Inference in Credit Scoring. Jie-Men Mok

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Figure B.1: Optimal ownership as a function of investment interrelatedness. Figure C.1: Marginal effects at low interrelatedness

The Real Business Cycle Model

Clustering in the Linear Model

6.2 Permutations continued

Sales forecasting # 2

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Univariate Time Series Analysis; ARIMA Models

Problem of Missing Data

Imbens/Wooldridge, Lecture Notes 5, Summer 07 1

Portfolio selection based on upper and lower exponential possibility distributions

Nonlinear Regression:

Multinomial and Ordinal Logistic Regression

Lecture 15. Endogeneity & Instrumental Variable Estimation

Section 6.1 Joint Distribution Functions

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

A Subset-Continuous-Updating Transformation on GMM Estimators for Dynamic Panel Data Models

A Basic Introduction to Missing Data

Covariance and Correlation

Transcription:

Binary Outcome Models: Endogeneity and Panel Data ECMT 676 (Econometric II) Lecture Notes TAMU April 14, 2014 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 1 / 40

Topics Issues in binary response models: Endogeneity I I Continuous endogenous x F F Control function approach IV probit approach Binary endogenous x Panel data I I I RE CRE FE ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 2 / 40

Binary response model with endogeneity Endogeneity arises naturally in the latent variable model LPM (linear probability model) with 2SLS is a handy solution for binary response with endogeneity. But it does not allow for individual-speci c marginal e ects We will deal with probit/logit with endogeneity Why endogeneity causes problem in binary model? not as transparent as in the linear model ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 3 / 40

The binary outcome model: y i = I fyi > 0g yi = xi 0 i Now we allow Ex i u i 6= 0 by assuming u i jx i N(q(x i ), 1). Then P(y i = 1jx i ) = P(u i < x 0 i βjx i ) = P(u i q(x i ) < x 0 i β q(x i )jx i ) = Φ(x 0 i β q(x i )) Probit of y i on x i is not consistent. We will see general endogeneity (σ uv 6= 0) introduces not only non-constant mean but also non-unity variance. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 4 / 40

When the endogenous variable is continuous A control function (CF) approach The part is based on Blundell and Powell (2004 REStud) and also Wooldridge (15.7.2). Unanswered question: why not traditional 2 stages? The model y 1i = I fy1i > 0g y1i = xi 0 0 + u i = z1i 0 β + αy 2i + u i Now suppose part of x i (called y 2i, assuming as a scalar) is endogenous: x i = (z 0 1i, y 2i ) 0. The vector of IVs z i = (z 0 1i, z0 2i )0. The reduced form (RF) y 2i = z 0 i δ + v i = z 0 1i δ 1 + z 0 2i δ 2 + v i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 5 / 40

The parameters (α, β) index the average structural function (ASF) (or, the response probability). Again, this is de ned by xing all observed explanatory variables and integrating out the unobservable, u: ASF (y 2, z 1 ) = E u f1[z1 0 β + αy 2 + u > 0]g = Φ(z1 0 β + αy 2) Thus, α and β are the parameters that appear in the APEs (derivatives of ASF). ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 6 / 40

Assume ui Endogeneity comes from σ uv 6= 0. v i 1 jz N 0, σ uv σuv σ 2 v Normality of v is not realistic if y 2i is binary. So we are assuming for now the endogenous variable is continuous. Probit has some advantage here over logit, because we can then assume joint normality ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 7 / 40

Rivers and Vuong (1998) proposed a "control function (CF)" approach: "which introduces residuals from the reduced form for the regressors as covariates in the binary response model to account for endogeneity" (BP, 2004). Linear projection u = θv + e where θ = σ uv /σ 2 v and e is also normal. Having the linear projection, we get We will need D(ejz 1, y 2, v). I (y 1 = 1) = I (z1i 0 β + αy 2i + u i > 0) = I (z1i 0 β + αy 2i + θv i + e i > 0) the mean part ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 8 / 40

First we have e v = u θv 1 θ = v 0 1 1 ρ 2 0 N(0, 0 σ 2 v u v ), given z, where ρ = corr(u, v) = σ uv /σ v. Then e is independent of v : uncorrelatedness of e and v (which also follows from the linear projection), implies independence, since they are jointly normal. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 9 / 40

Let D(j) mean conditional distribution D(ejz 1, y 2, v) = D(ejz, v) (since y 2 is generated by z and v) = D(ejz) (since e is independent of v) N(0, 1 ρ 2 ). So standard assumptions hold: conditional normality, homoskedasticity But the identi cation condition fails: Var(e) 6= 1. The outcome equation becomes (CF) y 1i = z 0 1i β + αy 2i + θv i + e i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 10 / 40

Then P(y 1 = 1jx, v) = P(y 1 = 1jz 1, y 2, v) = P(z1i 0 β + αy 2i + θv i + e i > 0jz 1, y 2, v) = P(e i > (z1i 0 β + αy 2i + θv i )jz 1, y 2, v) = Φ (z1i 0 β + αy 2i + θv i )/ p 1 ρ 2 (1) v i can be estimated from RF, as bv i A probit of y 1 on z 1, y 2 and bv gives the estimates, called eβ, eα and eθ. So q eβ! β = β/ 1 ρ 2 eα! q α = α/ 1 ρ 2 q eθ! θ = θ/ 1 ρ 2 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 11 / 40

Recall we are interested in β and α (especially α). The problem is now how to estimate ρ. Note that q θ = θ/ 1 ρ 2 θ = σ uv /σ 2 v = ρσ v /σ 2 v = ρ/σ v Solve the system for ρ and θ, since σ v and θ can be estimated. So p 1 ρ 2 can be estimated: p q 1 ρ 2 = 1 + θ 2 σ 2 v. So q bβ = eβ/ 1 + eθ 2 bσ 2 v q bα = eα/ 1 + eθ 2 bσ 2 v ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 12 / 40

An alternative derivation of (1) By Fact 1 (see below), u i jv i N(θv i, 1 ρ 2 ). This conditional normality is essentially all we need (we don t really need joint normality) So E (y 1i jx i, v i ) = P(u i > xi 0 β 0 jx i, v i ) = 1 P(u i < xi 0 β 0 jx i, v i ) = 1 P( u i θv p i 1 ρ 2 < x i 0β 0 θv p i 1 ρ 2 jx i, v i ) = 1 Φ( θv p i ) 1 ρ 2 xi 0β 0 = Φ( x 0 i β 0 + θv i p 1 ρ 2 ) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 13 / 40

Fact 1 This result is also crucial in Kalman lter. If then where u N v mu m v Suu, S vu ujv N(m ujv, S ujv ) S uv S vv m ujv = m u + S uv S 1 vv (v m v ) S ujv = S uu S uv S 1 vv S vu ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 14 / 40

Since v i has to be estimated, so CF regression involves a generated regressor. So getting the standard error is hard. An alternative approach: MLE (it is also called IV probit) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 15 / 40

Control function approach: in the linear model, it leads to the same estimate as 2SLS Wooldridge, Section 6.2, p. 127 & Problem 5.1 Exercise: show CF=2SLS (using OLS algebra) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 16 / 40

When the endogenous variable is continuous IV probit approach IV probit: the key is to obtain f (y 1, y 2 jz) f (y 1, y 2 jz) = f (y 1 jy 2, z) f (y 2 jz) First, f (y 2 jz) = f (zi 0δ + v i jz) N(zi 0δ, σ2 v ) = ϕ((y 2 zi 0δ)/σ v )/σ v Second, to get f (y 1 jy 2, z), essentially P(y 1 = 1jy 2, z), we need the distribution of u given y 2, z, where y1i = z1i 0 β + αy 2i + u i ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 17 / 40

By Fact 1, given z f (ujy 2, z) = f (ujz, v) N(σ uv σ 2 v v, 1 σ 2 uv σ 2 v ) This can be compared with the exogenous probit, in which case f (ujy 2, z) = f (ujz, v) u?v = f (ujz) N(0, 1) So endogeneity not only introduces non-zero mean but also non-unit variance in the conditional distribution of u. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 18 / 40

So P(y 1 = 1jy 2, z) = = P(z1i 0 β + αy 2i + u i > 0) = P(u i > q (z1i 0 β + αy 2i )) = P(u i σ uv σv 2 v/ 1 ρ 2 q > ( z1i 0 β αy 2i σ uv σv 2 v)/ q 1 ρ 2 ) = Φ((z1i 0 β + αy 2i + ρσv 1 v)/ 1 ρ 2 ) q = Φ((z1i 0 β + αy 2i + ρσv 1 (y 2 {z } z 0 δ))/ 1 ρ 2 ) =θ Φ(w) The likelihood for the i th observation: l i = Φ(w) y 1 [1 Φ(w)] 1 y 1 ϕ((y 2 zi 0δ)/σ v )/σ v Log likelihood for the sample L(β, α, ρ, σ v, δ)= n i =0 log l i Maximization over β, α, ρ, σ v, δ. Standard error: the estimated Hessian, or the outer product of the score. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 19 / 40

A simple test of exogeneity: ρ = 0. Control function approach: only focuses on f (y 1 jy 2, z), is thus a limited information procedure. CF approach replaces v by bv. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 20 / 40

When the endogenous variable is binary Binary exogenous variable needs special treatment because the linear model in the rst stage is not reasonable any more. The model y 1i = I fy1i > 0g y1i = xi 0 0 + u i = z1i 0 β + y 2i 0 α + u i y 2 = I fzi 0 i > 0g Like before, we assume (u, v) is jointly normal given z But here Var(v) = 1 for identi cation Assume, given z ui v i 1 ρ jz i N 0, ρ 1 ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 21 / 40

The likelihood of observation i : l i = f (y 1i, y 2i jz i ) = f (y 1i jy 2i, z i ) f (y 2i jz i ). The second factor is easy to obtain: f (y 2 jz) = P(y 2 = 1) y 2 [1 P(y 2 = 1)] 1 y 2 The rst factor is one of four cases: = Φ(z 0 δ) y 2 [1 Φ(z 0 δ)] 1 y 2 P(y 1 = 1jy 2 = 1, z) P(y 1 = 0jy 2 = 1, z) [=1-P(y 1 = 1jy 2 = 1, z)] P(y 1 = 1jy 2 = 0, z) P(y 1 = 0jy 2 = 0, z) [=1-P(y 1 = 1jy 2 = 0, z)] Thus MLE is obtained: L(β, α, ρ, δ) = n i =0 log l i We will calculate these four cases now. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 22 / 40

Fact 2 If v is standard normal, PDF (vjv > a) ϕ(v)/p(v > a) = ϕ(v)/φ( a) And PDF (vjv < a) ϕ(v)/p(v < a) = ϕ(v)/[1 Φ( a)] (proof?) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 23 / 40

When the endogenous variable is binary P(y 1 = 1jy 2 = 1, z) = E (I (y 1 = 1)jy 2 = 1, z) = E fe [I (y 1 = 1)jv, z]jy 2 = 1, zg (smaller conditional set dominates) = E (P(y 1 = 1jv, z)jy 2 = 1, z) q = E (Φ((z1i 0 β + αy 2i + ρv)/ 1 q ρ 2 )j y2 =1,z ) = E (Φ((z1i 0 β + αy 2i + ρv)/ 1 ρ 2 )jv i > zi 0 δ, z) = Z q support of v Φ((z0 1i β + αy 2i + ρv)/ Z by Fact 2 q = where (*) uses a projection. z 0 i δ Φ((z 0 1i β + αy 2i + ρv)/ 1 ρ 2 )f (vjv i > z 0 i δ, z)dv 1 ρ 2 )ϕ(v)/φ(z 0 i δ)dv, ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 24 / 40

Similarly P(y 1 = 1jy 2 = 0, z) Z zi 0 = δ q Φ((z1i 0 β + αy 2i + ρv)/ 1 ρ 2 )ϕ(v)/[1 Φ(zi 0 δ)]dv Similarly we can compute P(y 1 = 0jy 2 = 1, z) and P(y 1 = 0jy 2 = 0, z) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 25 / 40

EXAMPLE: E ects of Children on Labor Force Participation, LABSUP.txt y 1 = worked, y 2 = morekids (a dummy for having more than two children). Population is women with at least two children. worked = 1[α morekids + β 0 + β 1 nonmomi + β 2 educ +β 3 age + β 4 age 2 + β 5 black + β 6 hispan + u > 0] The binary variable samesex is the IV for morekids. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 26 / 40

Binary panel data model Pooled probit/logit Panel model without unobserved e ects: P(y it = 1jx it ) = Φ(x 0 it β) Strict exogeneity (SE): D(y it jx i ) = D(y it jx it ) for t = 1,, T, where x i = (x i1,, x it ) 0. (D means distribution) Stronger than the one in the linear model: in terms of expectation there. One di culty of using MLE (which is necessary for binary models) is y it is not independent over t, which is inconvenient in constructing the likelihood function Conditional independence (CI): y i1, y i2,, y it are independent conditional on x i. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 27 / 40

Under these two assumptions: f (y i1, y i2,, y it jx i ) CI = Π T t=1 f (y it jx i ) SE = Π T t=1 f (y it jx it ), where f (y it jx it ) = Φ(x it β) y it [1 Φ(x it β)] 1 y it. Pooled likelihood n T [y it log Φ(xit 0 β) + (1 y it ) log(1 Φ(x 0 i =1 t=1 Maximizer: pooled probit estimator. it β))] Partial likelihood theory says it is still consistent if CI doesn t hold. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 28 / 40

Binary panel data model RE model Now we add unobserved e ects Response probability: P(y it = 1jx it, c i ) = Φ(x 0 it β + c i ) SE: D(y it jx i, c i ) = D(y it jx it, c i ) for t = 1,, T. CI: y i1, y i2,, y it are independent conditional on x i, c i y i1, y i2,, y it are unconditionally dependent because of c i. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 29 / 40

CI implies: f (y i1, y i2,, y it jx i, c i ) CI = Π T t=1 f (y it jx i, c i ) SE = Π T t=1 f (y it jx it, c i ), where f (y it jx it, c i ) = Φ(x it β + c i ) y it [1 Φ(x it β + c i )] 1 y it. But c i is unobservable, so c i shouldn t appear in the likelihood function Viewing c i as parameters along with β leads to an incidental parameters problem. It means that MLEs bc i and bβ are inconsistent. (unlike in the linear case, in which we have p n consistency) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 30 / 40

Hallmark of FE analysis (in nonlinear models): no speci cation of a distribution for c i given x i. RE (random e ects) probit: assuming (The RE assumption) c i jx i N(0, σ 2 c ) Since the rst element of x i is 1, it implies c i N(0, σ 2 c ). Then a conditional ML can be applied to β and σ 2 c as follows. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 31 / 40

Dealing with c i : integrate out c i f (y i1, y i2,, y it jx i ) = Z Π T t=1 f (y it jx it, c i )ϕ(c/σ c )/σ c dc. The likelihood function for the entire sample is then straightforward: production over i It is called RE probit estimator. The estimate is also called population averaged. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 32 / 40

As shown before, under RE assumption, viewing c i + u it as the composite error in the latent variable model and applying pooled probit will give the attenuation bias This is in contrast to the linear model. Latent variable model yi = xit 0 i + u it xit 0 it y it = I (yi > 0) Var(v it jx it ) = Var(c i + u it jx it ) u it?c = i Var(uit jx it ) + σ 2 c Var(u it jx it, c i ) + σ 2 c = 1 + σ 2 c Response probability: u it?c = i P(y it = 1jx it ) = Φ(x 0 it β/ q1 + σ 2 c ) So bβ p! β/ p 1 + σ 2 c This result also motivates the necessity to specify the distribution of c (in contrast to linear RE model) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 33 / 40

Binary panel data model CRE model RE and CI might be too strong. Correlated RE (CRE) probit (Chamberlain s): A relaxation of RE: c i jx i N(x 0 i ξ, σ2 c ) allowing dependence between c i and x i Estimation is straightforward: only needs a modi cation of density of c f (y i1, y i2,, y it jx i ) = Z Π T t=1 f (y it jx it, c i )ϕ((c x 0 i ξ)/σ c )/σ c dc. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 34 / 40

Binary panel data model FE logit Panel logit model: where G (x) = e x /(1 + e x ). P(y it = 1jx it, c i ) = G (x 0 it β + c i ), An important advantage of panel logit is that we can obtain p N consistent and asymp. normal estimator of β without any assumptions on D(c i jx i ), or achieving xed e ects (FE) estimation. Assumptions: SE, CI, and that each element of x it is time-varying. De ne N i = T t=1 y it. It turns out that D(y i jx i, c i, N i ) does not depend on c i. The functional form of logit helps. We can t do this in probit. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 35 / 40

Consider T = 2. Then N i 2 f0, 1, 2g. Note that neither of which is informative for β. So we only consider N i = 1. P(y i1 = 1jx i, c i, N i = 0) = 0 P(y i2 = 1jx i, c i, N i = 0) = 0 P(y i1 = 1jx i, c i, N i = 2) = 1 P(y i2 = 1jx i, c i, N i = 2) = 1 As we will show, consistent estimation of β can be obtained by a standard logit of y i2 on x i2 x i1 using the observations for which N i = 1. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 36 / 40

where P(y i2 = 1jx i, c i, N i = 1) = P(y i2 = 1, N i = 1jx i, c i ) P(N i = 1jx i, c i ) Top = P(y i2 = 1, y i1 = 0jx i, c i ) CI = P(y i2 = 1jx i, c i )P(y i1 = 0jx i, c i ) SE = G (x 0 i2 β + c i )[1 G (x 0 i1 β + c i )] = e x 0 i2 β+c i (1 + e x 0 i2 β+c i )(1 + e x 0 i1 β+c i ), Bottom = P(y i2 = 1, y i1 = 0jx i, c i ) + P(y i2 = 0, y i1 = 1jx i, c i ) = G (xi2 0 β + c i )[1 G (xi1 0 β + c i )] + [1 G (xi2 0 β + c i )]G (xi1 0 β + c i ) e x i1 0 β+c i + e x i2 0 β+c i = (1 + e x 0 i2 β+c i )(1 + e x 0 i1 β+c i ). ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 37 / 40

So P(y i2 = 1jx i, c i, N i = 1) = = G ((x i2 x i1 ) 0 β) e x 0 i2 β+c i e x 0 i1 β+c i + e x 0 i2 β+c i = e(x i2 x i1 ) 0 β 1 + e (x i2 x i1 ) 0 β c i is canceled! On the other hand P(y i1 = 1jx i, c i, N i = 1) = P(y i2 = 0jx i, c i, N i = 1) = 1 G ((x i2 x i1 ) 0 β) ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 38 / 40

f (y i1, y i2 jx i, c i, N i = 1) = P(y i1 = 1jx i, c i, N i = 1) or P(y i2 = 1jx i, c i, N i = 1) = P(y i1 = 1jx i, c i, N i = 1) y i1 P(y i2 = 1jx i, c i, N i = 1) y i2 So the conditional log likelihood function for observation i is y i1 =1 y i2 l i (β) = I (N i = 1)[y i1 log(1 G ((x i2 x i1 ) 0 β)) +y i2 log G ((x i2 x i1 ) 0 β)] = I (Ni = 1)[(1 y i2 ) log(1 G ((x i2 x i1 ) 0 β)) +y i2 log G ((x i2 x i1 ) 0 β)] We select out the observations for which N i = 1. Computationally, it is just a standard logit of y i2 on x i2 observations for which N i = 1. x i1 using the ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 39 / 40

It is called conditional MLE (CMLE), as it is di erent for a full MLE. To get a full MLE: P(y i1, y i2 jx i, c i ) = P(y i1, y i2 jx i, c i, N i = 0) P(N {z } i = 0jx i, c i ) deterministic +P(y i1, y i2 jx i, c i, N i = 1) P(N {z } i = 1jx i, c i ) CMLE +P(y i1, y i2 jx i, c i, N i = 2) P(N {z } i = 2jx i, c i ). deterministic Here we used the rule of total probability: P(A)=P(AjB 1 )P(B 1 )+P(AjB 2 )P(B 2 ) if B 1 and B 2 forms a partition of the sample space. A side: this rule was used to show the selection bias, which mistakenly assumes P(A)=P(AjB 1 ). CMLE doesn t consider P(N i = 0jx i, c i ), P(N i = 1jx i, c i ) or P(N i = 2jx i, c i ) which may depend on β. ECMT 676 (TAMU) Binary Outcomes: Endogeneity and Panel April 14, 2014 40 / 40