1 Logit & Probit Models for Binary Response

Similar documents
LOGIT AND PROBIT ANALYSIS

Regression with a Binary Dependent Variable

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Multiple Choice Models II

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Multinomial and Ordinal Logistic Regression

Regression III: Advanced Methods

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Standard errors of marginal effects in the heteroskedastic probit model

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

2. Linear regression with multiple regressors

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

On Marginal Effects in Semiparametric Censored Regression Models

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

SAS Software to Fit the Generalized Linear Model

Logit Models for Binary Data

SYSTEMS OF REGRESSION EQUATIONS

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Ordinal Regression. Chapter

Chapter 6: Multivariate Cointegration Analysis

Classification Problems

Statistics in Retail Finance. Chapter 6: Behavioural models

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Lecture 3: Linear methods for classification

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Linear Classification. Volker Tresp Summer 2015

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

CREDIT SCORING MODEL APPLICATIONS:

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Nominal and ordinal logistic regression

Module 3: Correlation and Covariance

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Poisson Models for Count Data

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

The Method of Partial Fractions Math 121 Calculus II Spring 2015

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Simple Regression Theory II 2010 Samuel L. Baker

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Statistics in Retail Finance. Chapter 2: Statistical models of default

Discussion Section 4 ECON 139/ Summer Term II

Nonlinear Regression Functions. SW Ch 8 1/54/

Review of Fundamental Mathematics

Logistic Regression.

11. Analysis of Case-control Studies Logistic Regression

Association Between Variables

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

The Cobb-Douglas Production Function

L3: Statistical Modeling with Hadoop

1 Teaching notes on GMM 1.

From the help desk: Bootstrapped standard errors

Econometrics Simple Linear Regression

1.5 Oneway Analysis of Variance

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Models for Longitudinal and Clustered Data

Section 1.1 Linear Equations: Slope and Equations of Lines

Correlated Random Effects Panel Data Models

Module 4 - Multiple Logistic Regression

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Regression for nonnegative skewed dependent variables

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Polynomial and Rational Functions

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Solución del Examen Tipo: 1

Correlation key concepts:

Factor Analysis. Factor Analysis

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Lecture 15. Endogeneity & Instrumental Variable Estimation

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Financial Vulnerability Index (IMPACT)

Maximum Likelihood Estimation

GENDER DIFFERENCES IN MAJOR CHOICE AND COLLEGE ENTRANCE PROBABILITIES IN BRAZIL

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

Elements of statistics (MATH0487-1)

Zeros of Polynomial Functions

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Introduction to Quantitative Methods

The Probit Link Function in Generalized Linear Models for Data Mining Applications

0.8 Rational Expressions and Equations

End User Satisfaction With a Food Manufacturing ERP

INTRODUCTION TO MULTIPLE CORRELATION

Introduction to Fixed Effects Methods

2.5 Zeros of a Polynomial Functions

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

1 Another method of estimation: least squares

1. THE LINEAR MODEL WITH CLUSTER EFFECTS

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

3.6. Partial Fractions. Introduction. Prerequisites. Learning Outcomes

Elementary Statistics Sample Exam #3

Transcription:

ECON 370: Limited Dependent Variable 1 Limited Dependent Variable Econometric Methods, ECON 370 We had previously discussed the possibility of running regressions even when the dependent variable is dichotomous in nature (binary dependent variable, which we can think of as probability measures). However, the caveat we noted is that there is nothing to guarantee that the predicted dependent variable would fall between 0 and 1, which is the support by definition of a probability measure. 1 Logit & Probit Models for Binary Response As noted, the key complaints against the Linear Probability Model (LPM) is that, 1. Predicted dependent variable may not be within the support. 2. Partial Effects are constant for all explanatory variables. In the binary response model, the principle concern is with the response probability, Pr(y = 1 x) = Pr(y = 1 x 1, x 2,..., x k ) (1) Suppose what we are examining is the probability of high school graduation after a new compulsory education policy, then y could be 1 if the child graduates, and 0 otherwise. While x would include family characteristics, and include an indicator or dummy variable for whether the child lived through the new law or otherwise. To eliminate the limitations of the LPM, we will assume the following, Pr(y = 1 x) = F (β 0 + β 1 x 1 + β 2 x 2 +... + β k x k ) (2) where F (.) is a function such that F : x [0, 1], x R. There are various function suggested for F (.), of which we will be discussing the two most popular. Firstly, the Logit Model is based on the assumption that F (.) follows a logistic (cumulative) distribution, F (x) = exp(x) 1 + exp(x) = Λ(x) (3)

ECON 370: Limited Dependent Variable 2 The other is the Probit Model assumes that the function F (.) follows a normal (cumulative) distribution, F (x) = Φ(x) = x φ(z)dz (4) where φ(z) is the normal density function, φ(z) = exp( z2 2 ) 2π (5) The logit and probit models can be derived from an latent variable model. Let y be an unobserved or latent variable determined by, y = β 0 + β 1 x 1 + β 2 x 2 +... + β k x k + ɛ (6) The idea here is that the observed variable, y, will take on a value of 1 if y is greater than 0 (I(y > 0)), and 0 otherwise, where I(.) is an indicator function, and takes on the value 1 if the term in brackets is true. In order to estimate this function, we will still assume that the expected value of the error terms given the independent variables is 0, i.e. that there are uncorrelated. The distribution of the error term is dependent on the underlying assumption made about F (.) of course (note that both Logistic and Normal distribution functions are symmetric about 0). Given the assumptions on the distribution functions, and the specification for the latent variables, we can derive the response probabilities then, Pr(y = 1 x) = Pr(y > 0 x) = Pr(ɛ > β 0 β 1 x 1 β 2 x 2... β k x k x) = 1 F ( β 0 β 1 x 1 β 2 x 2... β k x k ) = F (β 0 + β 1 x 1 + β 2 x 2 +... + β k x k ) Note that the last equality is exactly what we want, as in equation (2). Because y typically does not have a measure that is easily interpretable, when examining the effect of the independent variable, we examine it in relation to the effect it has on Pr(y = 1 x). To find the partial effect of say variable x j, just note that, F (xβ) x j = f(xβ)β j (7) if x j is a continuous variable. f(.) is the density function, which if we assume a logistic or normal distribution function, F (.) is strictly increasing, and f(.) > 0 x, and therefore the partial effect always has the sign of the coefficient, β j.

ECON 370: Limited Dependent Variable 3 An interesting fact to note is that the relative effects of any two continuous independent variable does not depend on x since the ratio of the partial effects between two independent variables x j and x k is just β j β k. If x j were instead a dichotomous or binary variable, then the partial effect is, F (β 0 + β 1 x 1 + β 2 x 2 +... + β j 1 x j 1 + β j + β j+1 x j+1 +... + β k x k ) F (β 0 + β 1 x 1 + β 2 x 2 +... + β j 1 x j 1 + β j+1 x j+1 +... + β k x k ) and as before, the effect will be dependent on the values of x, which will affect your estimate of the partial effect. Most statistical programs calculates this at the mean. However, when the researcher has substantial number of dummy variables, it is more accurate to calculate this partial effects for each group or category, at means for the continuous variables for observations which belong to the category. The principal difficulty is due to the scale factor, i.e. the fact that the partial effect is dependent on the independent variables, which consequently makes the interpretation of partial effect difficult. Some commonly used is to take the average of the partial effects estimated for each observation. Consequently, this is commonly called the average partial effect. For continuous variables, this involves calculating, n 1 f(x i β) βj Here, even if the independent variables include dummy variables, by averaging the partial effect over the entire sample, you would have implicitly weighted the partial effect by the proportion of the observations you observed them in their respective categories. The average partial effect for dummy variables are similar in the sense that they are still averages across the sample, but we have to account for the discrete change (note that the formula here works both for dummy, and discrete variables, such as year of education, which very often would occur as an independent variable, particularly in wage regressions). The formula for dummy variables would be, n 1 ( ) G(x j β j + x j βj ) G(x j β j ) while that for discrete variables if it increases by the usual unit measure, n 1 ( ) G(x j β j + (x j+1 + 1) β j ) G(x j β j + x j+1 βj )

ECON 370: Limited Dependent Variable 4 where the subscript j, implies all variables excluding that indexed by j. There is no restriction as to what kind of functional form for the independent variables that the researcher can use, that is you can use log values and quadratic values of the independent variables in the regression. Of course, the partial effects has to be calculated for interpretation. The care in interpretation is especially crucial when examining interaction terms. 1.1 Maximum Likelihood Estimation Since the regression equation is non-linear in both parameter and variables, OLS and WLS are not applicable. Of course, we can always use Non-Linear Least Squares (which is not part of the curriculum), it is far easier to use Maximum Likelihood Estimation. Suppose we have on hand n observations for all the variables. Then the probability of observing any outcome is just, L i (y i x i ; β) = (F (x i β)) y i (1 F (x iβ)) 1 y i (8) We call the above the likelihood function. observation i is just, Consequently, the log likelihood function for l i (β) = y i log(f (x i β)) + (1 y i ) log(1 F (x i β)) (9) Since the logistic and normal distributions are increasing and convex, the log transformation ensures that the problem is well behaved. Then the log Likelihood function for the entire sample is just l(β) = y i log(f (x i β)) + (1 y i ) log(1 F (x i β)) = l i (β) (10) Under general assumptions, the estimates of the coefficients using MLE are consistent, asymptotically normal and efficient. The asymptotic variance is as follows, ( ) 1 [f(x i β)] 2 x p lim var( β) = ix i (11) F (x i β)(1 F (xi β)) Given the variance, the test of significance of the estimates are the usual t test.

ECON 370: Limited Dependent Variable 5 1.2 Testing Multiple Hypotheses There are three ways of testing multiple restrictions in Probit and Logit models, namely 1. Lagrange Multiplier or Score Test 2. Wald Test 3. Likelihood Ratio Test The most commonly used test, and most easily calculated test is the last, Likelihood Ratio Test. It is used when we wish to test our exclusion restrictions, i.e. whether we should or should not exclude a variable or set of variables. The idea is a simple one, since what we are maximizing is the log likelihood function, and since as variables are excluded from the regression relationship, the objective function falls. The question then is whether we have a significant fall in the log likelihood function value. The likelihood ratio statistic is just twice the difference in the log likelihood functions: LR = 2(L ur L r ) where L ur and L r are the log likelihoods for the unrestricted and restricted models. It must be noted that since the distribution functions within the log are strictly [0, 1], this implies that taking log would yield negative numbers. Nonetheless, there is no correction needed, you simply take the difference between the two numbers still. The LR statistics follows a χ 2 q distribution asymptotically, where q is the degrees of freedom, which also correspond with the number of exclusion restrictions. Finally note that since L ur L r, the LR statistic is always necessarily a positive number. A possible goodness of fit measure in Probit and Logit models is the percent correctly predicted measure. Because it is typically not reported by most statistical software, to create it would need additional calculations. The idea is to compare for how many of the predicted probabilities correspond with the observed action of the observation. Let the predicted probability be F (x i β) for observation 1. You would then have to create a new variable ỹ such that it takes on the value of 1 if F (x i β) 0.5, and 0 otherwise. Next create the difference between the observed action, y and the predicted ỹ. If the prediction is correct, this difference is 0, else not. By counting the number of zeros and divided the count by the total number of observations then gives you the measure of goodness of fit you want here.

ECON 370: Limited Dependent Variable 6 Another common goodness of fit which is reported is that proposed by McFadden(1974), 1 L ur L o where as before L ur is the log likelihood value you obtain from the regression, and L o is that generated by a regression (either Probit or Logit. But note that this goodness of fit measure is also used and reported often by statistical packages for any regression that involves MLE) with only the intercept. The key idea here is that L ur L o so that the fraction above is always between 0 and 1. So that the proposed goodness of fit would be close to 0 is the regression has no explanatory power, and if good, would be close to 1. This is called a Pseudo R 2. There are other versions of the Pseudo R 2, one of which is also touched on in your text, page 590. For the similar concerns such as heteroskedasticity and endogeneity it is recommended that you read Wooldridge s reference book Econometric Analysis of Cross Section and Panel Data.