Estimation of Discrete-Choice Models from Choice-Based Samples with. Misclassification in the Response Variable. Steven B. Caudill

Estmaton of Dscrete-Choce Models from Choce-Based Samples wth Msclassfcaton n the Response arable Steven B. Caudll Department of Economcs Auburn Unversty Stephen R. Cosslett Department of Economcs Oho State Unversty November 1 2004 Abstract Dscrete choce models wth multplcatve ntercepts can be estmated from choce-based samples usng the random-samplng mamum lelhood estmator even when response varables are msclassfed despte the fact that observed response probabltes no longer have a multplcatve-ntercept form. Keywords: Msclassfcaton; Choce-based samplng Dscrete choce models Endogenous stratfcaton. JEL classfcaton: C13 C25 Correspondng author. Address: Department of Economcs Oho State Unversty Columbus Oho 43210-1172 USA; tel.: 1-614-292-4106; fa: 1-614-292-3906; emal: cosslett.1@osu.edu

1. Introducton Ths note s about the estmaton of dscrete choce models from choce-based samples when the outcomes are subect to msclassfcaton n the specal case where the dscrete choce model has multplcatve ntercept form. The leadng eample of such a model s the multnomal logt model wth a full set of choce-specfc dummy varables. When outcomes are correctly observed t s well nown that a multplcatve ntercept model can be estmated wthout tang the choce-based nature of the sample nto account: the model parameters other than the ntercepts are consstently and effcently estmated whle consstent estmates of the ntercepts can be recovered f the samplng weghts for the strata are nown. We show that ths result holds even when outcomes are subect to msclassfcaton assumng that there are no a pror restrctons on the msclassfcaton probabltes. That s the msclassfcaton can be handled by a standard method such as that of ausman et al. 1998 treatng the sample as f t were random. Ths problem of msclassfcaton n the estmaton of dscrete choce models from choce-based samples has been addressed n a recent paper by Ramalho 2002 who presents a general method of estmaton that smultaneously corrects both for msclassfcaton and for endogenous stratfcaton. Ramalho s estmator s consstent and asymptotcally effcent n general settngs ncludng the specal case consdered here. The smplfcaton presented here however should be of nterest especally as the multple logt model s wdely used n appled research. 2. Choce-based samplng A dscrete-choce model wth multplcatve ntercepts has outcome probabltes of the form 2

3 1 for correctly observed outcomes M... 1 where... 1 M. A conventonal normalzaton s 1 M and some sutable restrcton on that allows to be dentfed. For eample n the multnomal logt model ep wth 0 M. A choce-based sample s a stratfed sample wth the strata defned by the observed dscrete outcomes. Frst suppose that the outcomes are correctly observed. Let be the fracton of the sample wth outcome and let be the correspondng fracton of the populaton. The choce model can be consstently estmated by mamzng a lelhood functon based on the modfed probabltes 1 CB 2 For the multplcatve ntercept model 1 ths gves CB. 3 Ths has same form as 1 but wth changed ntercepts. Ths leads to the well-nown result that s stll consstently estmated f the choce-based nature of the 1 See for eample Mans and McFadden 1981.

sample s gnored. 2 It also mples that the orgnal ntercepts can be consstently estmated f the populaton shares other data; otherwse the ntercepts are not dentfed. are nown or can be consstently estmated from 3. Estmaton wth msclassfed responses Now consder the problem of msclassfcaton. Let be the unnown probablty of observng outcome when the true outcome s. These probabltes are assumed not to depend on or. 3 In random samplng the probablty of observed outcome s then. 4 Ths s the msclassfcaton model consdered by ausman et al. 1998 and corresponds to equaton 15 of Ramalho 2002. Evdently the probabltes are no longer of multplcatve-ntercept form. But the result gven above relatng the lelhoods for choce-based samplng and random samplng depended crucally on the multplcatve ntercepts. It mght therefore appear that the stratfed nature of the sample has to be taen nto account n order to consstently estmate. 2 D. McFadden as quoted n Mans and Lerman 1977. 3 If the msclassfcaton probabltes depend on then the parameter transformatons gven below wll not wor; f there s dependence on then there wll be a loss of effcency relatve to mamum lelhood estmaton. 4

5 In the choce-based sample wth msclassfed outcomes let and be the sample share and populaton share respectvely of the observed outcome. Substtutng and for and n 2 gves the probabltes CB 5 correspondng to equaton 16 of Ramalho. As before the choce model can be consstently estmated by mamzng a lelhood functon based on these probabltes. Defne the modfed msclassfcaton probabltes δ 6 as n equaton 12 of Ramalho. These are the probabltes of observng outcome gven a case n the choce-based sample wth true outcome. Defne also the modfed ntercept terms M. 7 where the denomnator s a scale factor to retan the conventonal normalzaton 1 M. Then equaton 5 can be rewrtten after changng the order of summaton n the denomnator as

δ CB. 8 Ths now has the same form as for random samplng as gven by equaton 4. The apparent ntercept terms true values and and msclassfcaton probabltes δ are dfferent from the but the structural parameters are the same. Therefore f the choce-based nature of the sample s gnored and we correct only for msclassfcaton wll stll be consstently and effcently estmated. The status of the other parameters depends on whether the samplng weghts for the strata are nown or can be consstently estmated from some other data. If the samplng weghts are unnown then nether the weghts nor the true ntercepts nor the msclassfcaton probabltes are separately dentfed n a multplcatve ntercept model a problem whch mght not be mmedately apparent from the orgnal parameterzaton n equaton 5. In ths case the estmator ˆ could be used for nference about the underlyng tradeoffs mpled by the margnal values but there s not enough nformaton for nference about the margnal effects of on the choce probabltes. On the other hand f the samplng weghts are nown then consstent estmates of the probabltes estmates of δ and and the ntercept terms can be recovered from the by solvng the sample analogs of equatons 6 and 7: 6

7 δ δ ˆ ˆ ˆ 9 M ˆ ˆ ˆ ˆ. 10 Knowledge of the populaton shares contans addtonal nformaton whch was not taen nto account n estmatng so we should verfy the effcency of ˆ n ths case. One way of formulatng the constraned mamum lelhood estmator for a choce-based sample wth nown s based on the obectve functon 4 N n n n n L 1 log ~ where the choce probabltes are those n equaton 4 n s the observed response n case n of the sample and... 1 M s a set of Lagrange multplers. The obectve functon s mnmzed wth respect to subect to 1 and then mamzed wth respect to. The frst-order condtons wth respect to and then mply that ust as n the case of a multplcatve ntercept model wth no msclassfcaton. Substtutng for n the obectve functon L ~ changng to the new parameters defned by equatons 6 and 7 and droppng some constant terms we retreve the random-samplng log lelhood based on the probabltes n equaton 8. It follows that there s no loss of effcency f we use the random-samplng mamum 4 See Secton 2.19 of Cosslett 1981.

lelhood estmator followed by the correctons 9 and 10 to the msclassfcaton probabltes and the multplcatve ntercepts. References Cosslett S. R. 1981. Effcent estmaton of dscrete choce models. In: Mans C.F. McFadden D. Eds. Structural Analyss of Dscrete Data wth Econometrc Applcatons. The MIT ress Cambrdge MA. ausman J.A. Abrevaya F. Scott-Morton F.M. 1998. Msclassfcaton of the dependent varable n a dscrete-response settng. Journal of Econometrcs 87 239 269. Mans C. and Lerman S. 1977. The estmaton of choce probabltes from chocebased samples. Econometrca 45 1977 1988. Mans C.F. McFadden D. 1981. Alternatve estmators and sample desgns for dscrete choce analyss. In: Mans C.F. McFadden D. Eds. Structural Analyss of Dscrete Data wth Econometrc Applcatons. The MIT ress Cambrdge MA. Ramalho E.A. 2002. Regresson models for choce-based samples wth msclassfcaton n the response varable. Journal of Econometrcs 106 171 201. 8