Estimation of Discrete-Choice Models from Choice-Based Samples with. Misclassification in the Response Variable. Steven B. Caudill

Similar documents
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Calculation of Sampling Weights

What is Candidate Sampling

Can Auto Liability Insurance Purchases Signal Risk Attitude?

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

How To Calculate The Accountng Perod Of Nequalty

The OC Curve of Attribute Acceptance Plans

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

DEFINING %COMPLETE IN MICROSOFT PROJECT

Ring structure of splines on triangulations

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Support Vector Machines

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

STUDY ON THE DEMAND FORECAST METHOD FOR THE INTER- URBAN PUBLIC TRANSPORT UNDER THE HIGH-SPEED RAILWAYS IN SHANGHAI-NANJING CORRIDOR

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Texas Instruments 30X IIS Calculator

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Survival analysis methods in Insurance Applications in car insurance contracts

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

CHAPTER 14 MORE ABOUT REGRESSION

STATISTICAL DATA ANALYSIS IN EXCEL

Logistic Regression. Steve Kroon

8 Algorithm for Binary Searching in Trees

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Regression Models for a Binary Response Using EXCEL and JMP

Evaluating credit risk models: A critique and a new proposal

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Joe Pimbley, unpublished, Yield Curve Calculations

DO LOSS FIRMS MANAGE EARNINGS AROUND SEASONED EQUITY OFFERINGS?

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

Prediction of Disability Frequencies in Life Insurance

1. Measuring association using correlation and regression

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

The Current Employment Statistics (CES) survey,

How To Find The Dsablty Frequency Of A Clam

Chapter 2 The Basics of Pricing with GLMs

Statistical Methods to Develop Rating Models

Single and multiple stage classifiers implementing logistic discrimination

Pricing Multi-Asset Cross Currency Options

Richard W. Andrews and William C. Birdsall, University of Michigan Richard W. Andrews, Michigan Business School, Ann Arbor, MI

WORKING PAPERS. The Impact of Technological Change and Lifestyles on the Energy Demand of Households

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Extending Probabilistic Dynamic Epistemic Logic

Quantization Effects in Digital Filters

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

Sample Design in TIMSS and PIRLS

Estimation of Dispersion Parameters in GLMs with and without Random Effects

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

A Secure Password-Authenticated Key Agreement Using Smart Cards

Chapter 7: Answers to Questions and Problems

Support vector domain description

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Addendum to: Importing Skill-Biased Technology

1 Example 1: Axis-aligned rectangles

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

Demographic and Health Surveys Methodology

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

The Application of Fractional Brownian Motion in Option Pricing

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Linear regression analysis of censored medical costs

An Alternative Way to Measure Private Equity Performance

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Do Banks Use Private Information from Consumer Accounts? Evidence of Relationship Lending in Credit Card Interest Rate Heterogeneity

The Effects of Tax Rate Changes on Tax Bases and the Marginal Cost of Public Funds for Canadian Provincial Governments

A Probabilistic Theory of Coherence

Recurrence. 1 Definitions and main statements

Project Networks With Mixed-Time Constraints

Gender differences in revealed risk taking: evidence from mutual fund investors

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

DETERMINANTS OF BORROWING LIMITS ON CREDIT CARDS. Shubhasis Dey + Gene Mumy ++

Lecture 5,6 Linear Methods for Classification. Summary

Survive Then Thrive: Determinants of Success in the Economics Ph.D. Program. Wayne A. Grove Le Moyne College, Economics Department

Efficient Project Portfolio as a tool for Enterprise Risk Management

Marginal Returns to Education For Teachers

Selection bias and econometric remedies in accounting and finance research

Transition Matrix Models of Consumer Credit Ratings

Quantification of qualitative data: the case of the Central Bank of Armenia

Optimal Bidding Strategies for Generation Companies in a Day-Ahead Electricity Market with Risk Management Taken into Account

Daily O-D Matrix Estimation using Cellular Probe Data

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Traditional versus Online Courses, Efforts, and Learning Performance

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Analysis of Demand for Broadcastingng servces

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

Financial Instability and Life Insurance Demand + Mahito Okura *

Transcription:

Estmaton of Dscrete-Choce Models from Choce-Based Samples wth Msclassfcaton n the Response arable Steven B. Caudll Department of Economcs Auburn Unversty Stephen R. Cosslett Department of Economcs Oho State Unversty November 1 2004 Abstract Dscrete choce models wth multplcatve ntercepts can be estmated from choce-based samples usng the random-samplng mamum lelhood estmator even when response varables are msclassfed despte the fact that observed response probabltes no longer have a multplcatve-ntercept form. Keywords: Msclassfcaton; Choce-based samplng Dscrete choce models Endogenous stratfcaton. JEL classfcaton: C13 C25 Correspondng author. Address: Department of Economcs Oho State Unversty Columbus Oho 43210-1172 USA; tel.: 1-614-292-4106; fa: 1-614-292-3906; emal: cosslett.1@osu.edu

1. Introducton Ths note s about the estmaton of dscrete choce models from choce-based samples when the outcomes are subect to msclassfcaton n the specal case where the dscrete choce model has multplcatve ntercept form. The leadng eample of such a model s the multnomal logt model wth a full set of choce-specfc dummy varables. When outcomes are correctly observed t s well nown that a multplcatve ntercept model can be estmated wthout tang the choce-based nature of the sample nto account: the model parameters other than the ntercepts are consstently and effcently estmated whle consstent estmates of the ntercepts can be recovered f the samplng weghts for the strata are nown. We show that ths result holds even when outcomes are subect to msclassfcaton assumng that there are no a pror restrctons on the msclassfcaton probabltes. That s the msclassfcaton can be handled by a standard method such as that of ausman et al. 1998 treatng the sample as f t were random. Ths problem of msclassfcaton n the estmaton of dscrete choce models from choce-based samples has been addressed n a recent paper by Ramalho 2002 who presents a general method of estmaton that smultaneously corrects both for msclassfcaton and for endogenous stratfcaton. Ramalho s estmator s consstent and asymptotcally effcent n general settngs ncludng the specal case consdered here. The smplfcaton presented here however should be of nterest especally as the multple logt model s wdely used n appled research. 2. Choce-based samplng A dscrete-choce model wth multplcatve ntercepts has outcome probabltes of the form 2

3 1 for correctly observed outcomes M... 1 where... 1 M. A conventonal normalzaton s 1 M and some sutable restrcton on that allows to be dentfed. For eample n the multnomal logt model ep wth 0 M. A choce-based sample s a stratfed sample wth the strata defned by the observed dscrete outcomes. Frst suppose that the outcomes are correctly observed. Let be the fracton of the sample wth outcome and let be the correspondng fracton of the populaton. The choce model can be consstently estmated by mamzng a lelhood functon based on the modfed probabltes 1 CB 2 For the multplcatve ntercept model 1 ths gves CB. 3 Ths has same form as 1 but wth changed ntercepts. Ths leads to the well-nown result that s stll consstently estmated f the choce-based nature of the 1 See for eample Mans and McFadden 1981.

sample s gnored. 2 It also mples that the orgnal ntercepts can be consstently estmated f the populaton shares other data; otherwse the ntercepts are not dentfed. are nown or can be consstently estmated from 3. Estmaton wth msclassfed responses Now consder the problem of msclassfcaton. Let be the unnown probablty of observng outcome when the true outcome s. These probabltes are assumed not to depend on or. 3 In random samplng the probablty of observed outcome s then. 4 Ths s the msclassfcaton model consdered by ausman et al. 1998 and corresponds to equaton 15 of Ramalho 2002. Evdently the probabltes are no longer of multplcatve-ntercept form. But the result gven above relatng the lelhoods for choce-based samplng and random samplng depended crucally on the multplcatve ntercepts. It mght therefore appear that the stratfed nature of the sample has to be taen nto account n order to consstently estmate. 2 D. McFadden as quoted n Mans and Lerman 1977. 3 If the msclassfcaton probabltes depend on then the parameter transformatons gven below wll not wor; f there s dependence on then there wll be a loss of effcency relatve to mamum lelhood estmaton. 4

5 In the choce-based sample wth msclassfed outcomes let and be the sample share and populaton share respectvely of the observed outcome. Substtutng and for and n 2 gves the probabltes CB 5 correspondng to equaton 16 of Ramalho. As before the choce model can be consstently estmated by mamzng a lelhood functon based on these probabltes. Defne the modfed msclassfcaton probabltes δ 6 as n equaton 12 of Ramalho. These are the probabltes of observng outcome gven a case n the choce-based sample wth true outcome. Defne also the modfed ntercept terms M. 7 where the denomnator s a scale factor to retan the conventonal normalzaton 1 M. Then equaton 5 can be rewrtten after changng the order of summaton n the denomnator as

δ CB. 8 Ths now has the same form as for random samplng as gven by equaton 4. The apparent ntercept terms true values and and msclassfcaton probabltes δ are dfferent from the but the structural parameters are the same. Therefore f the choce-based nature of the sample s gnored and we correct only for msclassfcaton wll stll be consstently and effcently estmated. The status of the other parameters depends on whether the samplng weghts for the strata are nown or can be consstently estmated from some other data. If the samplng weghts are unnown then nether the weghts nor the true ntercepts nor the msclassfcaton probabltes are separately dentfed n a multplcatve ntercept model a problem whch mght not be mmedately apparent from the orgnal parameterzaton n equaton 5. In ths case the estmator ˆ could be used for nference about the underlyng tradeoffs mpled by the margnal values but there s not enough nformaton for nference about the margnal effects of on the choce probabltes. On the other hand f the samplng weghts are nown then consstent estmates of the probabltes estmates of δ and and the ntercept terms can be recovered from the by solvng the sample analogs of equatons 6 and 7: 6

7 δ δ ˆ ˆ ˆ 9 M ˆ ˆ ˆ ˆ. 10 Knowledge of the populaton shares contans addtonal nformaton whch was not taen nto account n estmatng so we should verfy the effcency of ˆ n ths case. One way of formulatng the constraned mamum lelhood estmator for a choce-based sample wth nown s based on the obectve functon 4 N n n n n L 1 log ~ where the choce probabltes are those n equaton 4 n s the observed response n case n of the sample and... 1 M s a set of Lagrange multplers. The obectve functon s mnmzed wth respect to subect to 1 and then mamzed wth respect to. The frst-order condtons wth respect to and then mply that ust as n the case of a multplcatve ntercept model wth no msclassfcaton. Substtutng for n the obectve functon L ~ changng to the new parameters defned by equatons 6 and 7 and droppng some constant terms we retreve the random-samplng log lelhood based on the probabltes n equaton 8. It follows that there s no loss of effcency f we use the random-samplng mamum 4 See Secton 2.19 of Cosslett 1981.

lelhood estmator followed by the correctons 9 and 10 to the msclassfcaton probabltes and the multplcatve ntercepts. References Cosslett S. R. 1981. Effcent estmaton of dscrete choce models. In: Mans C.F. McFadden D. Eds. Structural Analyss of Dscrete Data wth Econometrc Applcatons. The MIT ress Cambrdge MA. ausman J.A. Abrevaya F. Scott-Morton F.M. 1998. Msclassfcaton of the dependent varable n a dscrete-response settng. Journal of Econometrcs 87 239 269. Mans C. and Lerman S. 1977. The estmaton of choce probabltes from chocebased samples. Econometrca 45 1977 1988. Mans C.F. McFadden D. 1981. Alternatve estmators and sample desgns for dscrete choce analyss. In: Mans C.F. McFadden D. Eds. Structural Analyss of Dscrete Data wth Econometrc Applcatons. The MIT ress Cambrdge MA. Ramalho E.A. 2002. Regresson models for choce-based samples wth msclassfcaton n the response varable. Journal of Econometrcs 106 171 201. 8