Multiple Choice Models II



Similar documents
Multinomial and Ordinal Logistic Regression

LOGIT AND PROBIT ANALYSIS

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Nominal and ordinal logistic regression

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Lecture 3: Linear methods for classification

Logistic Regression (1/24/13)

Reject Inference in Credit Scoring. Jie-Men Mok

CREDIT SCORING MODEL APPLICATIONS:

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Logistic regression modeling the probability of success

Maximum Likelihood Estimation

Introduction to General and Generalized Linear Models

Ordinal Regression. Chapter

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Credit Risk Models: An Overview

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit Models for Binary Data

Bayesian Statistics in One Hour. Patrick Lam

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Poisson Models for Count Data

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Least Squares Estimation

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Multivariate Logistic Regression

A General Approach to Variance Estimation under Imputation for Missing Survey Data

Linear Classification. Volker Tresp Summer 2015

Discrete Choice Analysis II

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

SAS Software to Fit the Generalized Linear Model

Statistical Machine Learning

Gerry Hobbs, Department of Statistics, West Virginia University

Standard errors of marginal effects in the heteroskedastic probit model

Introduction to Quantitative Methods

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Qualitative Choice Analysis Workshop 76 LECTURE / DISCUSSION. Hypothesis Testing

Interpretation of Somers D under four simple models

Introduction to Regression and Data Analysis

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Generalized Linear Models

Additional sources Compilation of sources:

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Chapter 7: Dummy variable regression

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

LOGISTIC REGRESSION ANALYSIS

11. Analysis of Case-control Studies Logistic Regression

VI. Introduction to Logistic Regression

Part 2: Analysis of Relationship Between Two Variables

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Models for Longitudinal and Clustered Data

Panel Data: Linear Models

A Classical Monetary Model - Money in the Utility Function

Regression with a Binary Dependent Variable

Multinomial Logistic Regression

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Simple Linear Regression Inference

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

5. Multiple regression

Lecture 19: Conditional Logistic Regression

Estimating the random coefficients logit model of demand using aggregate data

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Machine Learning Logistic Regression

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Choice under Uncertainty

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Statistical Models in R

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Lecture notes: single-agent dynamics 1

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Weight of Evidence Module

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Regression III: Advanced Methods

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Free Trial - BIRT Analytics - IAAs

Econometrics Simple Linear Regression

Penalized regression: Introduction

Regression 3: Logistic Regression

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

SUGI 29 Statistics and Data Analysis

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Multilevel Modeling of Complex Survey Data

A Basic Introduction to Missing Data

Transcription:

Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28

Categorical data Categorical variable models Y is the result of a single decision among more than 2 alternatives Unordered choice set: Categories/Qualitative choices multinomial logit, conditional logit, nested logit Ordered choice set (rankings): models for ordered data ordered probit Laura Magazzini (@univr.it) Multiple Choice Models II 2 / 28

Example: Education and Occupational Choice Education Primary/Secondary University Occupation School or more Total Menial 23 (74.19%) 8 (25.81%) 31 (100%) Blue Collar 60 (86.96%) 9 (13.04%) 69 (100%) Craft 65 (77.38%) 19 (22.62%) 84 (100%) WhiteCol 27 (65.85%) 14 (34.15%) 41 (100%) Prof 27 (24.11%) 85 (75.89%) 112 (100%) Total 202 (59.94%) 135 (40.06%) 337 (100%) Laura Magazzini (@univr.it) Multiple Choice Models II 3 / 28

Multinomial distribution Y i : qualitative random variable with J categories P ij = Pr(Y i = j), j = 1, 2,..., J Probability that individual i will choose alternative j Categories are mutually exclusive and exaustive: P ij = 1, i = 1, 2,..., N j Let d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j j d ij = 1, i = 1, 2,..., N Laura Magazzini (@univr.it) Multiple Choice Models II 4 / 28

Multinomial logit model (MNL) Y : result of a choice among J alternatives (J > 2) d i = (d i1, d i2,..., d ij ), where d ij = 1 if Y i = j P ij = Pr(Y i = j), j P ij = 1 Logit model: Pr(Y i = j) = exp(η ij ) J l=1 exp(η il) Laura Magazzini (@univr.it) Multiple Choice Models II 5 / 28

Properties of MNL Categorical variable models 0 P ij 1 j P ij = 1 (by definition) For every pair of alternatives (k, l), the probability ratio is P ik P il = exp(η ik) exp(η il ) log P ik P il = η ik η il The model can be motivated by a random utility model Laura Magazzini (@univr.it) Multiple Choice Models II 6 / 28

Random Utility Models (1) McFadden (1973, 2001) J alternatives: mutually exclusive, exhaustive, finite set Examples: competing brands, different means of transport, different occupations,... Categories can be ordered or unordered Different tecniques will be employed according to the nature of the alternatives Assume non-ordered alternatives Rational agent chooses the alternative that maximizes his/her utility: Y i = j if U ij > U ik for each k j Laura Magazzini (@univr.it) Multiple Choice Models II 7 / 28

Random Utility Models (2) McFadden (1973, 2001) Linear utility model: U ij = η ij + ɛ ij with η ij = LC(z ij, θ) η ij links the agent utility to factors that can be observed η ij is different from U ij since there are factors that cannot be observed by the researcher Pr(Y i = j) = Pr(U ij > U ik, k j) = Pr(η ij + ɛ ij > η ik + ɛ ik, k j) = Pr(ɛ ik ɛ ij < η ij η ik, k j) = I (ɛik ɛ ij <η ij η ik, k j)f (ɛ)dɛ ɛ with f probability density function of ɛ The model is made operational by a particular choice of distribution for the disturbance Closed functional forms exist only for few specifications (e.g. logit) Laura Magazzini (@univr.it) Multiple Choice Models II 8 / 28

How to specify η ij? Categorical variable models Standard MNL η ij = x i β j x individual characteristics, constant across all the alternatives j Conditional logit model η ij = z ij γ z ij characteristics of the choice j and individual i - Datasets typically analyzed by economists do not contain mixtures of individual and choice-specific attributes - CLM is usually applied when the interest is in the effect of choice-specific attributes - Custom transformation is needed for variables containing individual-specific attributes Laura Magazzini (@univr.it) Multiple Choice Models II 9 / 28

Standard MNL Pr(Y i = j x i ) = exp(x i β j) J l=1 exp(x i β l) It is not possible to estimates all the β 1,..., β J By adding a constant to all the βs, the probability doesn t change Indeterminacy in the model is removed by letting β 1 = 0 J = 1 is the reference category Pr(Y i = j x i ) = exp(x i β j) 1 + J l=2 exp(x i β l) Intercept in the model is allowed by letting the first column of x i = 1 for every i Laura Magazzini (@univr.it) Multiple Choice Models II 10 / 28

Estimation: MLE The log likelihood can be written as ln L = n J d ij ln Pr(Y i = j) i=1 j=1 with d ij = 1 if Y i = j, 0 otherwise The derivatives have the characteristically simple form: ln L β j = i (d ij P ij )x i = 0 As a consequence, if the model is estimated with an intercept, i d ij = i P ij = 1 Laura Magazzini (@univr.it) Multiple Choice Models II 11 / 28

Interpretation of the parameters The partial effects for this model are complicated: [ ] P j J = P j β j P k β k = P j [β j β] x i k=1 The coefficients in this model are difficult to interpret: P j / x k need not have the same sign as β jk A simpler interpretation by considering the odds ratio: ln P ij P i1 = x i β j ln P ij P ik = x i (β j β k ) if k 1 In case of dummy variables (coded as 0 or 1) ln P ij(x i =1) P i1(xi = β =1) j ln P ij(x i =1) P ik(xi = β =1) j β k if k 1 Laura Magazzini (@univr.it) Multiple Choice Models II 12 / 28

Conditional logit model Pr(Y i = j z j ) = exp(z j β) J k=1 exp(z kβ) The model contains choice-specific attributes The coefficients of individual-specific attributes (that do not vary across categories) are not identified Individual-specific variable can be inserted in the model, but need to be properly transformed All the coefficients of the choice-specific attributes cannot be separately identified: adding a constant to all the coefficients does not change the estimated probability The intercept is set to zero Laura Magazzini (@univr.it) Multiple Choice Models II 13 / 28

Marginal effects Categorical variable models P j (z) z k = β k [P j (z)(i (j=k) P k (z))] P j (z) z j = β z [P j (z)(1 P j (z))] P j (z) z h = β z P j (z)p h (z) (j h) P j change monotonically with respect to z The sign of the derivative depends on the sign of β z Opposite effect by considering z j or z h Simmetry: P j z h = P h z j P j does not change if all the variables z kh change in the same direction (the ranking of U ij is unchanged!) Laura Magazzini (@univr.it) Multiple Choice Models II 14 / 28

Multinomial logit (MNL) vs conditional logit (CNL) Similar response probabilities, but they differ in some important respects MNL: the conditioning variables do not change across alternatives Characteristics of the alternatives are unimportant or not of interest, or data are not available Example: occupational choice we do not know how much someone could make in every occupation We can collect data on factors affecting individual productivity and tastes, e.g education, past experience MNL: factors can have different effects on relative probabilities (different β j for different choices) CNL: choices on the basis of observable attributes of each alternative Common β MNL as a special case of CNL Important limitation: independence from irrelevant alternatives assumption Laura Magazzini (@univr.it) Multiple Choice Models II 15 / 28

Independence from irrelevant alternatives (logit) For every pair of alternatives (k, l), the probability ratio (odd) is ω = Pr(Y i = k x ik ) Pr(Y i = l x il ) = exp(η ik) exp(η il ) ω depends only on the linear predictors (η) of the considered alternatives, not on the whole set of alternatives From the point of view of estimation, it is useful that the odds ratio does not depend on the other choices But it is not a particularly appealing restriction to place on consumer behaviour Laura Magazzini (@univr.it) Multiple Choice Models II 16 / 28

IIA: example by McFadden (1984) Commuters initially choosing between cars and red buses with equal probabilities Suppose a third mode (blue buses) is added and commuters do not care about the colur of the bus (i.e. will chose between these with equal probability) IIA imply that the fraction of commuters taking a car would fall from, a result that is not very realistic 1 2 to 1 3 Laura Magazzini (@univr.it) Multiple Choice Models II 17 / 28

Testing IIA Hausman and McFadden (1984) If a subset of the choice set is truly irrelevant, omitting it from the model altogether will not change the parameter estimates sistematically Exclusion of these choices will be inefficient but will not lead to inconsistency But if the remaining odds are not truly independent from these alternatives, then the parameter estimates obtained when these choices are included will be inconsistent Therefore, Hausman s specification test can be applied Laura Magazzini (@univr.it) Multiple Choice Models II 18 / 28

The Hausman s specification test Consider two different estimators ˆθ E and ˆθ I Under H0, ˆθ E and ˆθ I are both consistent and ˆθ E is efficient relative to ˆθ I Under H1, ˆθ I remains consistent while ˆθ E is inconsistent Then H0 can be tested by using the Hausman statistics: H = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ˆθ E )] 1 (ˆθ I ˆθ E ) = (ˆθ I ˆθ E ) [Est.Asy.Var(ˆθ I ) Est.Asy.Var(ˆθ E )] 1 (ˆθ I ˆθ E ) d χ 2 J The appropriate degree of freedom for the test will depend on the context In the case of MNL, J is the number of parameter in the estimating equation of the restricted choice set Laura Magazzini (@univr.it) Multiple Choice Models II 19 / 28

What if IIA hypothesis is not satisfied? (1) Multivariate probit model U j = β x j + ɛ j, j = 1,..., J, [ɛ 1, ɛ 2,..., ɛ J ] N(0, Σ) Pr(Y i = j) = Pr(U j > U k, j = 1, 2,..., J, k j) Main obstacle: difficulty in computing the multivariate normal probability for any dimensionality higher than 2 Recent advances in accurate simulations of multinormal integrals have made estimation of MNP more feasible Simulation-based estimation Laura Magazzini (@univr.it) Multiple Choice Models II 20 / 28

IIA is maintained within groups, but does not need to hold across groups Main limitations Results can depend on the way in which groups are formed... There is no specification test to discriminated among different Laura Magazzini groupings (@univr.it) Multiple Choice Models II 21 / 28 Categorical variable models What if IIA hypothesis is not satisfied? (2) Generalized extreme value: Nested logit models Very appealing if it is possible to assume sequential choices The J alternatives are grouped into L subgroups: (1) First the group of alternative is chosen (2) Then, one alternative is chosen within the group

Treatment of rankings Ordered data Y can assume a limited number of categories y c, c = 0, 1,..., C Categories are inherently ordered: y 0 < y 1 < y 2 < y C Examples: Bond rating: AAA-D Symptoms: none, minor, serious Drug effect: worsen, none, partial recovery, full recovery Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied... Ordered probit and logit models Multinomial models would fail to account for the ordinal nature of the dependent variable OLS would attach a meaning to the difference between the category codings Laura Magazzini (@univr.it) Multiple Choice Models II 22 / 28

Latent regression Categorical variable models Treatment of rankings We consider a continuous latent variable y (unobserved), linear function of x and ɛ: y = x β + ɛ We observe y = c γ c < y γ c+1, with γ 0 = e γ C+1 = + The latent response is specified by a linear regression model without the intercept Laura Magazzini (@univr.it) Multiple Choice Models II 23 / 28

Ordered Probit Model y = x β + ɛ with ɛ N(0, 1) Categorical variable models Treatment of rankings Pr(y i = 0 x) = Pr(yi γ 1 ) = Pr(ɛ i γ 1 x β x) = Φ(γ 1 x β) Pr(y i = 1 x) = Pr(γ 1 < yi γ 2 ) = Φ(γ 2 x β) Φ(γ 1 x β). Pr(y i = C x) = Pr(y i > γ C ) = 1 Φ(γ C x β) Usually y has no real meaning The interest is in Pr(y x) rather than E(y x) To identify the parameters: x cannot contain the intercept If you have to specify a model with an intercept, set γ 1 = 0 Laura Magazzini (@univr.it) Multiple Choice Models II 24 / 28

Marginal effects Categorical variable models Treatment of rankings Coefficients are difficult to interpret: Pr(y i =0 x) x j = β j φ(γ 1 x β) sign opposite to the sign of β j Pr(y i =c x) x j ambiguous sign!!! = β j [φ(γ c+1 x β) φ(γ c x β)] Pr(y i =C x) x j = β j φ(γ C x β) same sign as β j Laura Magazzini (@univr.it) Multiple Choice Models II 25 / 28

Treatment of rankings Changes in y and y in response to changes in x Increasing one of the x s while holding β and γ constant is equivalent to shifting the distribution of y to the right (solid to dashed curve) Laura Magazzini (@univr.it) Multiple Choice Models II 26 / 28

Treatment of rankings Ordered Logistic Regression: ɛ i logistica Proportional odds model Pr(y i > c) = ( log Pr(yi >c) 1 Pr(y i >c) exp(x i β γc) 1+exp(x i β γc) ) = x i β γ c Pr(y i >c)/[1 Pr(y i >c)] Pr(y j >c)/[1 Pr(y j >c)] = exp[(x i x j ) β] Doesn t depend on the threshold Laura Magazzini (@univr.it) Multiple Choice Models II 27 / 28

Treatment of rankings Ordered Probit vs. Ordered Logit Coefficients and threshold parameters are different due to different scale factors (σ probit = 1, whereas σ logit = π 2 /3) Predicted probabilities are similar Marginal effects are similar If the logit is chosen, estimated coefficients can be interpreted in terms of odds Laura Magazzini (@univr.it) Multiple Choice Models II 28 / 28