LOGIT AND PROBIT ANALYSIS



Similar documents
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Regression with a Binary Dependent Variable

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Ordinal Regression. Chapter

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Module 5: Multiple Regression Analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

11. Analysis of Case-control Studies Logistic Regression

Introduction to Quantitative Methods

Simple linear regression

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Multinomial and Ordinal Logistic Regression

Standard errors of marginal effects in the heteroskedastic probit model

Multiple Choice Models II

Simple Regression Theory II 2010 Samuel L. Baker

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Semi-Log Model. Semi-Log Model

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Regression III: Advanced Methods

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Generalized Linear Models

Elements of statistics (MATH0487-1)

The Cobb-Douglas Production Function

Nonlinear Regression Functions. SW Ch 8 1/54/

The Probit Link Function in Generalized Linear Models for Data Mining Applications

NPV Versus IRR. W.L. Silber We know that if the cost of capital is 18 percent we reject the project because the NPV

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

End User Satisfaction With a Food Manufacturing ERP

II. DISTRIBUTIONS distribution normal distribution. standard scores

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Factors affecting online sales

Logit Models for Binary Data

Nominal and ordinal logistic regression

Calculating the Probability of Returning a Loan with Binary Probability Models

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

CALCULATIONS & STATISTICS

Chapter 4 Online Appendix: The Mathematics of Utility Functions

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

LCs for Binary Classification

Review Jeopardy. Blue vs. Orange. Review Jeopardy

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Review of Fundamental Mathematics

Week 4: Standard Error and Confidence Intervals

Causal Forecasting Models

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

VI. Introduction to Logistic Regression

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Introduction to Hypothesis Testing OPRE 6301

Multinomial Logistic Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Premaster Statistics Tutorial 4 Full solutions

Interpretation of Somers D under four simple models

CAPM, Arbitrage, and Linear Factor Models

SAS Software to Fit the Generalized Linear Model

Regression III: Advanced Methods

Logistic regression modeling the probability of success

Statistics. Measurement. Scales of Measurement 7/18/2012

CURVE FITTING LEAST SQUARES APPROXIMATION

A Basic Introduction to Missing Data

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

An introduction to Value-at-Risk Learning Curve September 2003

Sample Size and Power in Clinical Trials

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Introduction to Regression and Data Analysis

2. Linear regression with multiple regressors

6.4 Normal Distribution

Statistics for Sports Medicine

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Statistical Confidence Calculations

Part 2: Analysis of Relationship Between Two Variables

Multiple logistic regression analysis of cigarette use among high school students

Determining distribution parameters from quantiles

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Simple Linear Regression

Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Two Correlated Proportions (McNemar Test)

13. Poisson Regression Analysis

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Poisson Models for Count Data

Pricing I: Linear Demand

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Transcription:

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y is quantitative whereas the explanatory variables are either quantitative or qualitative. There are certain type of regression models in which the dependent or response variable is dichotomous in nature, taking a 1 or 0 value. Suppose one wants to study the labor-force participation of adult males as a function of the unemployment rate, average wage rate, family income, education etc. A person is either in the labor force or not. Hence, the dependent variable, labor-force participation, can take only two values: 1 if the person is in the labor force and 0 if he or she is not. There are several examples where the dependent variable is dichotomous. Suppose on wants to study the union membership status of the college professors as a function of several quantitative and qualitative variables. A college professor either belongs to a union or does not. Therefore, the dependent variable, union membership status, is a dummy variable taking on values 0 or 1, 0 meaning no union membership and 1 meaning union membership. Similarly, other examples can be ownership of a house: a family own a house or it does not, it has disability insurance or it does not, a certain drug is effective in curing a illness or it is not, decision of a firm to declare a dividend or not, President decides to veto a bill or accept it, etc. A unique feature of all these examples is that the dependent variable is of the type which elicits a yes or no response. There are special estimation / inference problems associated with such models. The most commonly used approaches to estimating such models are the Linear Probability model, the Logit model and the Probit model. There are certain problems associated with the estimation of Linear Probability Models such as: i) Non-normality of the Disturbances (U s) ii) Heteroscedastic variances of the disturbances iii) Nonfulfillment of 0 E(Y X) 1 ( Possibility of Ŷ Y lying ouside the 0-1 range) iv) Questionable value of R 2 as a measure of goodness of fit; and Linear Probability Model is not logically a very attractive model because it assumes that P i = E(Y = 1 X ) increases linearly with X, that is, the marginal or incremental effect of X remains constant throughout. This seems sometimes very unrealistic. Therefore, there is a need of a probability model that has two features: (1) as X increases, P i = E(Y = 1 X ) increases but never steps outside the 0-1 interval, and (2) the relationship between P i and X i is non-linear, that is, approaches one which approaches zero at slower and slower rates as X i gets small and approaches one at slower and slower rates as X gets very large.

The Logit Model Logit regression (logit) analysis is a uni/multivariate technique which allows for estimating the probability that an event occurs or not, by predicting a binary dependent outcome from a set of independent variables. In an example of home ownership where the dependent variable is owning a house or not in relation to income, the linear probability Model depicted it as P i = E(Y = 1 X i ) = β 1 + β 2 X i where X is the income and Y =1 means that the family owns a house. Let us Consider the following representation of home ownership: 1 1 P i = E(Y = 1 X i ) = ------------------------------- = ------------------------ (1) 1 + exp[ - (β 1 + β 2 X i ) ] 1 + exp ( - Z i ) where Z i = β 1 + β 2 X i This equation (1) is known as the (cumulative) logistic distribution function. Here Z i ranges from - to + ; P i ranges between 0 and 1; P i is non-linearly related to Z i (i.e. X i ) thus satisfying the two conditions required for a probability model. In satisfying these requirements, an estimation problem has been created because P i is nonlinear not only in X but also in the β s. This means that one cannot use OLS procedure to estimate the parameters. Here, P i is the probability of owning a house and is given by 1 ----------------------- 1 + exp ( - Z i ) Then (1- P i ), the probability of not owning a house, is 1 (1- P i ) = ---------------------- 1 + exp ( Z i ) Therefore, one can write P i 1 + exp ( Z i ) ---------------------- = -------------------------- (2) (1- P i ) 1 + exp ( - Z i ) P i / ( 1 - P i ) is the odds ratio in favour of owning a house i.e; the ratio of the probability that a family will own a house to the probability that it will not own a house. Taking natural log of (2), we obtain L i = ln [P i / ( 1 - P i )] = Z i = β 1 + β 2 X i (3) VI-56

That is, the log of the odds ratio is not only linear in X, but also linear in the parameters. L is called the Logit. Features of the Logit Model 1. As P goes from 0 to 1, the logit L goes from - to +. That is, although the probabilities lie between 0 and 1, the logits are not so bounded. 2. Although L is linear in X, the probabilities themselves are not. 3. The interpretation of the logit model is as follows: β 2, the slope, measures the change in L for a unit change in X, i.e it tells how the log odds in favour of owning a house change as income changes by a unit. The intercept β 1 is the value of the log odds in favour of owning a house if income is zero. 4. Given a certain level of income, say X *, if we actually want to estimate not the odds in favour of owning a house but the probability of owning a house itself, this can be done directly (1) once the estimates of β 1 and β 2 are available. 5. The linear probability model assumes that P i is linearly related to X i, the logit model assumes that the log of odds ratio is linearly related to X i Estimation of the Logit Model In order to estimate the logit model, we need apart from X i, the values of logit L i. By having data at micro or individual level, one cannot estimate (3) by OLS technique. In such situations, one has to resort to Maximum Likelihood method of estimation. In case of grouped data, corresponding to each income level X i, there are families among which n i are possessing a house. Therefore, one needs to compute This relative frequency is an estimate of true P i corresponding to each X i. Using the estimated P i, one can obtain the estimated logit as ^ ^ ^ ^ ^ L i = ln [P i / ( 1 - P i )] = Z i = β 1 + β 2 X i Steps in Estimating Logit Regression 1. Compute the estimated probability of owning a house for each income level X i, as 2. For each X i, obtain the logit as ^ ^ ^ L i = ln [P i / ( 1 - P i )] 3. Transform the logit regression in order to resolve the problem of heteroscedasticity as follows: W i L i = β 1 W i + β 2 W i X i + W i U i (4) ^ ^ where the weights W i = P i / ( 1 - P i ) 4. Estimate (4) by OLS (WLS is OLS on transformed data) VI-57

5. Establish confidence intervals and / or test hypothesis in the usual OLS framework. All the conclusions will be valid strictly only when the sample is reasonably large. Merits of Logit Model 1. Logit analysis produces statistically sound results. By allowing for the transformation of a dichotomous dependent variable to a continuous variable ranging from - to +, the problem of out of range estimates is avoided. 2. The logit analysis provides results which can be easily interpreted and the method is simple to analyse. 3. It gives parameter estimates which are asymptotically consistent, efficient and normal, so that the analogue of the regression t-test can be applied. Demerits 1. As in the case of Linear Probability Model, the disturbance term in logit model is heteroscedastic and therefore, we should go for Weighted Least Squares. 2. has to be fairly large for all X i and hence in small sample; the estimated results should be interpreted carefully. 3. As in nay other regression, there may be problem of multicollinearity if the explanatory variables are related among themselves. 4. As in Linear Probability Models, the conventionally measured R 2 is of limited value to judge the goodness of fit. Application of Logit Model Analysis 1. It can be used to identify the factors that affect the adoption of a particular technology say, use of new varieties, fertilizers, pesticides etc, on a farm. 2. In the field of marketing, it can be used to test the brand preference and brand loyalty for any product. 3. Gender studies can use logit analysis to find out the factors which affect the decision making status of men/women in a family. The Probit Model In order to explain the behaviour of a dichotomous dependent variable we have to use a suitably chosen Cumulative Distribution Function (CDF). The logit model uses the cumulative logistic function. But this is not the only CDF that one can use. In some applications, the normal CDF has been found useful. The estimating model that emerges from the normal CDF is known as the Probit Model or Normit Model. Let us assume that in home ownership example, the decision of the ith family to own a house or not depends on unobservable utility index I i, that is determined by the explanatory variables in such a way that the larger the value of index I i, the greater the probability of the family owning a house. The index I i can be expressed as I i = β 1 + β 2 X i, (5) where X i is the income of the ith family. VI-58

Assumption of Probit Model For each family there is a critical or threshold level of the index (I i * ), such that if I i exceeds I i *, the family will own a house otherwise not. But the threshold level I i * is also not observable. If it is assumed that it is normally distributed with the same mean and variance, it is possible to estimate the parameters of (5) and thus get some information about the unobservable index itself. In Probit Analysis, the unobservable utility index (I i ) is known as normal equivalent deviate (n.e.d) or simply Normit. Since n.e.d. or I i will be negative whenever P i < 0.5, in practice the number 5 is added to the n.e.d. and the result so obtained is called the Probit i.e; Probit = n.e.d + 5 = I i + 5 In order to estimate β 1 and β 2, (5) can be written as I i = β 1 + β 2 X i + U i (6) Steps Involved in Estimation of Probit Model 1. Estimate P i from grouped data as in the case of Logit Model, i.e. ^ 2. Using P i, obtain n.e.d (I i ) from the standard normal CDF, i.e. I i = β 1 + β 2 X i, 3. Add 5 to the estimated I i to convert them into probits and use the probits thus obtained as the dependent variable in (6). 4. As in the case of Linear Probability Model and Logit Models, the disturbance term is heteroscedastic in Probit Model also. In order to get efficient estimates, one has to transform the model. 5. After transformation, estimate (6) by OLS. Logit versus Probit 1. The chief difference between logit and probit is that logistic has slightly flatter tails i.e; the normal or probit curve approaches the axes more quickly than the logistic curve. 2. Qualitatively, Logit and Probit Models give similar results, the estimates of parameters of the two models are not directly comparable. VI-59