Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Similar documents
Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

IDENTIFICATION IN A CLASS OF NONPARAMETRIC SIMULTANEOUS EQUATIONS MODELS. Steven T. Berry and Philip A. Haile. March 2011 Revised April 2011

Regression with a Binary Dependent Variable

On Marginal Effects in Semiparametric Censored Regression Models

Chapter 4: Vector Autoregressive Models

Lecture 3: Linear methods for classification

Basics of Statistical Machine Learning

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

From the help desk: Bootstrapped standard errors

PS 271B: Quantitative Methods II. Lecture Notes

Regression III: Advanced Methods

Reject Inference in Credit Scoring. Jie-Men Mok

LOGIT AND PROBIT ANALYSIS

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Statistics in Retail Finance. Chapter 6: Behavioural models

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Standard errors of marginal effects in the heteroskedastic probit model

Statistical Machine Learning

Multivariate Normal Distribution

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Multiple Choice Models II

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

Chapter 2. Dynamic panel data models

SAS Software to Fit the Generalized Linear Model

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

BayesX - Software for Bayesian Inference in Structured Additive Regression

The Identification Zoo - Meanings of Identification in Econometrics: PART 2

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Introduction to General and Generalized Linear Models

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Game Theory: Supermodular Games 1

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Linear Threshold Units

Generalized Random Coefficients With Equivalence Scale Applications

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Econometrics Simple Linear Regression

Chapter 6: Multivariate Cointegration Analysis

Clustering in the Linear Model

1. When Can Missing Data be Ignored?

2. Linear regression with multiple regressors

An extension of the factoring likelihood approach for non-monotone missing data

Metric Spaces. Chapter Metrics

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Reference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3.

Local classification and local likelihoods

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CHAPTER 2 Estimating Probabilities

Master s Theory Exam Spring 2006

1 Maximum likelihood estimation

Logistic Regression (1/24/13)

Regression III: Advanced Methods

Binary Outcome Models: Endogeneity and Panel Data

Econometric Analysis of Cross Section and Panel Data

SUMAN DUVVURU STAT 567 PROJECT REPORT

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Chapter 3: The Multiple Linear Regression Model

Smoothing and Non-Parametric Regression

(Quasi-)Newton methods

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Poisson Models for Count Data

Distribution (Weibull) Fitting

CREDIT SCORING MODEL APPLICATIONS:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

HT2015: SC4 Statistical Data Mining and Machine Learning

1 Teaching notes on GMM 1.

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

1. THE LINEAR MODEL WITH CLUSTER EFFECTS

Solución del Examen Tipo: 1

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

MATHEMATICAL METHODS OF STATISTICS

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Probability Generating Functions

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Markov Chain Monte Carlo Simulation Made Simple

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Factor analysis. Angela Montanari

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Logit Models for Binary Data

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Estimating an ARMA Process

How To Understand The Theory Of Probability

Estimating the random coefficients logit model of demand using aggregate data

STA 4273H: Statistical Machine Learning

CURVE FITTING LEAST SQUARES APPROXIMATION

Transcription:

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 1 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods natural experiment methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods natural experiment methods matching methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods natural experiment methods matching methods instrumental methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods natural experiment methods matching methods instrumental methods regression discontinuity and regression kink methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

Overview Subtitle: Models, Sampling Designs and Non/Semiparametric Estimation 1 discrete data: binary response 2 censored and truncated data : cenoring models 3 endogenously selected samples: selectivity model 4 experimental and quasi-experimental data: evaluation methods social experiments methods natural experiment methods matching methods instrumental methods regression discontinuity and regression kink methods control function methods Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 2 / 34

6. Discrete Choice Binary Response Models Let y i = 1 if an action is taken (e.g. a person is employed) y i = 0 otherwise for an individual or a firm i = 1, 2,..., N. We will wish to model the probability that y i = 1 given a kx1 vector of explanatory characteristics x i = (x 1i, x 2i,..., x ki ). Write this conditional probability as: Pr[y i = 1 x i ] = F (x i β) Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 3 / 34

6. Discrete Choice Binary Response Models Let y i = 1 if an action is taken (e.g. a person is employed) y i = 0 otherwise for an individual or a firm i = 1, 2,..., N. We will wish to model the probability that y i = 1 given a kx1 vector of explanatory characteristics x i = (x 1i, x 2i,..., x ki ). Write this conditional probability as: Pr[y i = 1 x i ] = F (x i β) This is a single linear index specification. Semi-parametric if F is unknown. We need to recover F and β to provide a complete guide to behaviour. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 3 / 34

We often write the response probability as p(x) = Pr(y = 1 x) = Pr(y = 1 x 1, x 2,..., x k ) for various values of x. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 4 / 34

We often write the response probability as for various values of x. p(x) = Pr(y = 1 x) = Pr(y = 1 x 1, x 2,..., x k ) Bernoulli (zero-one) Random Variables if Pr(y = 1 x) = p(x) then Pr(y = 0 x) = 1 p(x) E (y x) = p(x) Var(y x) = p(x)(1 p(x)) = 1.p(x) + 0.(1 p(x) Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 4 / 34

The Linear Probability Model Pr(y = 1 x) = β 0 + β 1 x 1 +... + β k x k = β x Unless x is severely restricted, the LPM cannot be a coherent model of the response probability P(y = 1 x), as this could lie outside zero-one. Note: E (y x) = β 0 + β 1 x 1 +... + β k x k Var(y x) = β x(1 x β) which implies that the OLS estimator is unbiased but ineffi cient. The ineffi ciency due to the heteroskedasticity. Homework: Develop a two-step estimator. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 5 / 34

Typically express binary response models as a latent variable model: y i = x i β + u i where u is some continuously distributed random variable distributed independently of x, where we typically normalise the variance of u. The observation rule for y is given by y = 1(y > 0). where G is the cdf of u i. Pr[yi 0 x i ] Pr[u i x i β] = 1 Pr[u i x i β] = 1 G ( x i β) Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 6 / 34

In the symmetric distribution case (Probit and Logit) Pr[y i 0 x i ] = G (x i β) where G is some (monotone increasing) cdf. (Make sure you can prove this). This specification is the linear single index model. Show that for the linear utility and a normal unobserved heterogeneity implies the single index Probit model Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 7 / 34

Random sample of observations on y i and x i i = 1, 2,...N. Pr[y i = 1 x i ] = F (x i β) where F is some (monotone increasing) cdf. This is the linear single index model. Questions? How do we find β given a choice of F (.) and a sample of observations on y i and x i? How do we check that the choice of F (.) is correct? Do we have to choose a parametric form for F (.)? Do we need a random sample - or can we estimate with good properties from (endogenously) stratified samples? What if the data is not binary - ordered, count, multiple discrete choices? Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 8 / 34

ML Estimation of the Binary Choice Model Assume we have N independent observations on y i and x i. The probability density of y i conditional on x i is given by: F (x i β) if y i = 1, and 1 F (x i β) if y i = 0. Therefore the density of any y i can be written: f (y i x i β) = F (x i β)y i (1 F (x i β))1 y i. The joint probability of this particular sequence of data is given by the product of these associated probabilities (under independence). Therefore the joint distribution of the particular sequence we observe in a sample of N observations is simply: f (y 1, y 2,..., y N ) = N i=1 F ( x i β ) y i (1 F ( x i β ) ) 1 y i This depends on a particular β and is also the likelihood of the sequence y 1, y 2..., y N, L(β; y 1, y 2,..., y N ) = N i=1 F ( x i β ) y i (1 F ( x i β ) ) 1 y i Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 9 / 34

ML Estimation of the Binary Choice Model If the model is correctly specified then the MLE β N will be - consistent, effi cient and asymptotically normal. log L is an easier expression: log L(β; y 1, y 2,..., y N ) = N i=1 [y i log F ( x i β ) + (1 y i ) log(1 F ( x i β ) ] The derivative of log L with respect to β is given by: log L β = = N i=1 N i=1 [y f (x i β) f (x i F (x i i β)x i + (1 y i ) β) 1 F (x i β)x i ] F (x i y i F (x i β) ( β) (1 F (x i β)).f x i β ).x i Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 10 / 34

ML Estimation of the Binary Choice Model The MLE β N refers to any root of the likelihood equation log L β N = 0 that corresponds to a local maximum. If log L is a concave function of β, as in the Probit and Logit cases (Exercise: prove for the Probit using 1 2 ln L N (β) N β β ), then this is unique. Otherwise there exists a consistent root. We will consider the properties of the average log likelihood 1 N log L, and assume that is converges to the true log likelihood and that this is maximised at the true value of β, given by β 0. Notice that log L β is nonlinear in β. In general, no explicit solution can be found. We have to use iterative procedures to find the maximum. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 11 / 34

ML Estimation of the Binary Choice Model Iterative Algorithms: Choose an initial β (0). Gradient method: β (1) = β (0) + log L β β (0) Convergence is slow Deflected Gradient method: β (1) = β (0) + H (0) log L β β (0) ( ) H (0) = 1 2 ln L N (β) 1 N β β β (0) Newton ( ) H (0) = E 2 ln L N (β) 1 β β β (0) Scoring Method ( [ ] H (0) = ln LN (β) ln L E N (β) 1 β β β (0)) BHHH Method Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 12 / 34

ML Estimation of the Binary Choice Model Theorem 1. (Consistency). If (i) the true parameter value β 0 is an interior point of parameter space. (ii) ln L N (β) is continuous. (iii) there exists a neighbourhood of β 0 such that 1 N ln L N (β) converges to a constant limit ln L(β) and that ln L(β) has a local maximum at β 0. Then the MLE β N is consistent, or there exists a consistent root. Note: 1 requires the correct specification of ln L N (β), in particular the Pr[y i = 1 x i ]. 2 Contrast with MLE in the linear model. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 13 / 34

ML Estimation of the Binary Choice Model Theorem 2. (Asymptotic Normality). (i) 2 ln L N (β) β β (ii) 1 N 2 ln L N (β) β β exists and is continuous evaluated at β N converges. 1 ln L (iii) N (β) N d β N(0, H) then N( β N β N ) d N(0, H 1 ). where Note: and [ E 1 N E 1 N [ H = lim E 1 N N If 2 ] ln L N (β) β β β0. 2 ln L N (β) β β β0 = E 1 ln L N (β) N β ln L N (β) β β0 ] 2 ln L N (β) 1 β β βo is the Cramer-Rao lower bound. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 14 / 34

ML Estimation of the Binary Choice Model Note that for Probit (and Logit) estimators E 2 ln L N (β) β β = N i=1 Φ (x i N = d i x i x i i=1 = X DX So that the var( β N ) can be approximated by (X DX ) 1 [φ (x i β)]2 β) [1 Φ (x i β)]x i x i This expression has a similar form to that in the heteroscedastic GLS model. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 15 / 34

The EM Algorithm In the case of the Probit there is another useful algorithm: now note that y i = x i β + u i with u i N(0, 1) and y i = 1(y i > 0) similarly E (yi y i = 1) = x i β + E (u i x i β + u i 0) = x i β + E (u i u i x i β) = x i β + φ(x i Φ(x i E (yi y i = 0) = x i β φ(x i β) Φ(x i β) Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 16 / 34

The EM Algorithm If we now define m i = E (y i y i ) then the derivative of the log likelihood can be written log L β = N x i (m i x i β) i=1 set this to zero (to solve for β) N N x i m i = x i x i β i=1 i=1 as in the OLS normal equations. We do not observe yi guess given the information we have. but m i is the best Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 17 / 34

The EM Algorithm Solving for β we have β = ( N x i x i=1 i ) 1 N i=1 x i m i. Notice m i depends on β. This forms an EM (or Fair) algorithm: 1. Choose β(0) 2. Form m i (0) and compute β(1), etc. This converges, but slower than deflected gradient methods. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 18 / 34

Samples and Sampling Let Pr(y x β) be the population conditional probability of y given x. Let f (x) be the true marginal distribution of x. Let π(y x β) be the sample conditional probability. Case 1: Random Sampling π(y, x) = π(y x β)π(x) but π(x) = f (x) and π(y x β) = Pr(y x β). Case 2: Exogenous Stratification π(y, x) = Pr(y x β)π(x) Although π(x) = f (x) the sample still replicates the conditional probability of interest in the population which is the only term that contains β in the log likelihood. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 19 / 34

Samples and Sampling Case 3: Choice Based Sampling (Manski and Lerman) Suppose Q is the population proportion that make choice y = 1. Let P represent the sample fraction. Then we can adjust the likelihood contribution by: Q P F (x i β). If we know Q then the adjusted MLE is consistent for choice-based samples. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 20 / 34

Semiparametric Estimation in the Linear Index Case (i) Semiparametric E (y i x i ) = F ( x i β ) retain finite parameter vector β in the linear index but relax the parametric form for F. (ii) Nonparametric E (y i x i ) = F (g(x i )) both F and g are nonparametric. As you would expect, typically (i) has been followed in research. What is the parameter of interest? β alone? Notice that the function F (a + bx i β) cannot be separately identified from F (x i β). Therefore β is only identified up to location and scale. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 21 / 34

Semiparametric Estimation in the Linear Index Case To motivate, imagine x i β z i was known but F (.) was not. Seems obvious: run a general nonparametric (kernel say) regression of y on z. (i) How do we find β? (ii) How do we guarantee monotonic increasing F? Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 22 / 34

Semiparametric Estimation in the Linear Index Case Semiparametric Estimation of β (single index models) * Iterated Least Squares and Quasi-Likelihood Estimation (Ichimura and Klein/Spady) Note that E (y i x i ) = F ( x i β ) so that y i = F ( x i β ) + ε i with E (ε i x i ) = 0. A semiparametric least squares estimator can be derived. Choose β to minimise S(β) = 1 N π(x i )(y i F (x i β)) 2 replacing F with a kernel regression F h at each step with bandwidth h, simply a function of the scaler x i β for some given value of β. π(x i ) is a trimming function that downweights observations near the boundary of the support of x i β. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 23 / 34

Semiparametric Estimation in the Linear Index Case Typically F h is estimated using a leave-one-out kernel. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 24 / 34

Semiparametric Estimation in the Linear Index Case Typically F h is estimated using a leave-one-out kernel. Ichimura (1993) shows that this estimator of β up to scale is N consistent and asymptotically normal. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 24 / 34

Semiparametric Estimation in the Linear Index Case Typically F h is estimated using a leave-one-out kernel. Ichimura (1993) shows that this estimator of β up to scale is N consistent and asymptotically normal. We have to assume F is differentiable and requires at least one continuous regressor with a non-zero coeffi cient. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 24 / 34

Semiparametric Estimation in the Linear Index Case Typically F h is estimated using a leave-one-out kernel. Ichimura (1993) shows that this estimator of β up to scale is N consistent and asymptotically normal. We have to assume F is differentiable and requires at least one continuous regressor with a non-zero coeffi cient. Extends naturally to some other semi-parametric least squares cases. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 24 / 34

Semiparametric Estimation in the Linear Index Case Typically F h is estimated using a leave-one-out kernel. Ichimura (1993) shows that this estimator of β up to scale is N consistent and asymptotically normal. We have to assume F is differentiable and requires at least one continuous regressor with a non-zero coeffi cient. Extends naturally to some other semi-parametric least squares cases. It is also common to weight the elements in this regression to allow for heteroskedasticity. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 24 / 34

Semiparametric Estimation in the Linear Index Case Note that the average log-likelihood can be written: 1 N log L N (β) = 1 N π(x i ){y i ln F (x i β) + (1 y i )y i ln(1 F (x i β)) So maximise log L N (β), replacing F (.) by kernel type non-parametric regression of y on z i = x i β at each step. Klein and Spady (1993) show asymptotic normality and that the outer-product of the gradients of the quasi-loglikelihood is a consistent estimator of the variance-covariance matrix. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 25 / 34

Semiparametric Estimation in the Linear Index Case Maximum Score Estimation (Manski) Suppose F is unknown Assume: the conditional median of u given x is zero (note that this is weaker than independence between u and x) = Pr[y i = 1 x i ] > ( ) 1 2 if x i β > ( )0 Maximum Score Algorithm: score 1 if y i = 1 and x i β > 0, or y i = 0 and x i β 0. score 0 otherwise. Choose β that maximises the score, subject to some normalisation on β. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 26 / 34

Semiparametric Estimation in the Linear Index Case Note that the scoring algorithm can be written: choose β to maximise S N (β) = 1 N N i=1 [2.1(y i = 1) 1]1(x i β 0). The complexity of the estimator is due to the discontinuity of the function S N (β). Horowitz (1992) suggests a smoothed MSE: S N (β) = 1 N N i=1 [2.1(y i = 1) 1]K ( x i β h ) where K is some continuous kernel function with bandwidth h. No longer discontinuous. Therefore can prove N convergence and asymptotic distribution properties. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 27 / 34

Endogenous Variables Consider the following (triangular) model y1i = x 1i β + γy 2i + u 1i (1) y 2i = z i π 2 + v 2i (2) where y 1i = 1(y 1i > 0). z i = (x 1i, x 2i ). The x 2i are the excluded instruments from the equation for y 1. The first equation is a the structural equation of interest and the second equation is the reduced form for y 2. y 2 is endogenous if u 1 and v 2 are correlated. If y 1 was fully observed we could use IV (or 2SLS). Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 28 / 34

Control Function Approach Use the following othogonal decomposition for u 1 where E (ɛ 1i v 2i ) = 0. u 1i = ρv 2i + ɛ 1i Note that y 2 is uncorrelated with u 1i conditional on v 2. The variable v 2 is sometimes known as a control function. Under the assumption that u 1 and v 2 are jointly normally distributed, u 2 and ɛ are uncorrelated by definition and ɛ also follows a normal distribution. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 29 / 34

Control Function Estimator Use this to define the augmented model y1i = x 1i β + γy 2i + ρv 2i + ɛ 1i y 2i = z i π 2 + v 2i 2-step Estimator: Step 1: Estimate π 2 by OLS and predict v 2, v 2i = y 2i π 2z i Step 2: use v 2i as a control function in the model for y1 estimate by standard methods. above and Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 30 / 34

Semi-parametric Estimation with Endogeneity Blundell and Powell (REStud, 2004) extend the control function approach to the semiparametric case. Suppose we define x i = [x 1i, y 2i ] and β 0 = [β, γ]. Recall that if x is independent of u 1, then E (y 1i x i ) = G (x i β 0 ) where G is the distribution function for u 1. Sometimes also known as the average structural function, ASF. Note that with endogeneity of u 1 we can invoke the control function assumption: u 1 x v 2 This is the conditional independence assumption derived from the triangularity assumption in the simultaneous equations model, see Blundell and Matzkin (2013). Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 31 / 34

Semi-parametric Estimation with Endogeneity Using the control function assumption we have and E [y 1i x i, v 2i ] = F (x i β 0, v 2i ), G (x i β 0 ) = F (x i β 0, v 2i )df v2. Blundell and Powell (2003) show β 0 and the average structural function G (x i β 0 ) = F (x i β 0, v 2i )df v2 are point identified. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 32 / 34

Semi-parametric Estimation with Endogeneity Blundell and Powell (2004) develop a three step control function estimator: 1. Generate v 2 and run a nonparametric regression of y 1i on x i and v 2i. This provides a consistent nonparametric estimator of E [y 1i x i, v 2i ]. 2. Impose the linear index assumption on x i β 0 in: E [y 1i x i, v 2i ] = F (x i β 0, v 2i ). This generates F (x i β 0, v 2i ). 3. Integrate over the empirical distribution of v 2 to estimate β 0 and the average structural function (ASF), Ĝ (x i β 0 ). This third step is implemented by taking the partial mean over v 2 in F (x i β 0, v 2i ). Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 33 / 34

Semi-parametric Estimation with Endogeneity Able to show n consistency for β 0, and the usual non-parametric rate on ASF. Blundell and Matzkin (2013) discuss the ASF and alternative parameters of interest. Chesher and Rosen (2013) develop a new IV estimator in the binary choice and binary endogenous set-up. Blundell (University College London) ECONG107: Blundell Lecture 1 February-March 2016 34 / 34