Nested Logit. Brad Jones 1. April 30, University of California, Davis. 1 Department of Political Science. POL 213: Research Methods

Size: px

Start display at page:

Download "Nested Logit. Brad Jones 1. April 30, 2008. University of California, Davis. 1 Department of Political Science. POL 213: Research Methods"

Jayson Walters
7 years ago
Views:

1 Nested Logit Brad 1 1 Department of Political Science University of California, Davis April 30, 2008

3 Nested Logit Interesting model that does not have IIA property. Possible candidate model for structured choice situations. Conceptual example: J political parties a voter i could choose from. Say: Green, Workers, Social Dem., Moderate, CR, Extreme Right Models? Conditional logit or MNL? IIA property could be an issue.

Conceptual example: J political parties a voter i could choose from.

4 Nested Logit IIA says that the disturbances are independent and homoskedastic. Odds are assumed to remain the same if some alternative is removed. Problem: one left party is a close substitute (possibly) of another. If C D voters split their vote across two leftist parties, elimination of one from the choice set does not imply they will randomly distribute over remaining choices. That is, they most likely will gravitate to the remaining leftist party. If so, odds ratios will change because of nonrandom redistribution.

Problem: one left party is a close substitute (possibly) of another.

5 Nested Logit Under NL (or MNNL), the idea is to group comparable alternatives and then structure choice setting as a tree. Voter i decides to vote leftist, centrist, or rightist. Call this the top level choice. Once this choice is made, the voter must decide which outcome to choose: Left: Green, Workers; Center: SD, Moderate; Right: CR, Extreme Right Basic result from conditional probability: Pr ij = Pr j i Pr i J outcomes (i.e. parties) and i branches.

Once this choice is made, the voter must decide which outcome to choose: Left: Green, Workers; Center: SD,

6 Nested Logit Conditional probability says the probability of the bottom level choice is equal to the conditional probability of selecting j given branch i times the probability that branch i was selected. two levels of probability because two levels of decisions. Consider the conditional probability statement, Pr j i. Suppose we specify a utility model: U ij = β x ij + α w i As in the CL presentation, the x ij are covariates that can change over the choices (bottom level) and the w i are covariates that are attributes of the choice sets (top level).

Consider the conditional probability statement, Pr j i.

7 Nested Logit The conditional probabilities can only be a function of the x ij : Pr j i = = exp(β x ij ) exp(α w i ) exp(α w i ) N i k=1 exp(β x ik ) exp(β x ij ) Ni k=1 exp(β x ik ) The top level probability is defined by first identifying what is sometimes called an inclusive value parameter: ( Ni ) I i = log exp(β x ik ) k=1 The probability of branch i is then Pr i = exp(α w i + τ i I i ) C m=1 exp(α w i + τ m I m )

is defined by first identifying what is sometimes called an inclusive value parameter: ( Ni ) I i = log

8 Nested Logit The inclusive value parameter, τ, is the weight accorded each of the branches. Under CL (or MNL), we assume this weight is fixed at 1. Estimation is done via full information maximum likelihood: log L = N log [ ] Pr j i Pr i. i Model has many parameters. It requires a lot of work to interpret. My job to show you how... Stata is actually quite good w/this model.

Estimation is done via full information maximum likelihood: log L = N log [ ] Pr j i Pr i.

9 Nested Logit: Illustration I m going to continue with the Stata data set provided by their website. We used it with conditional logit. Let s consider the data structure.

10 . list family_id restaurant chosen kids rating distance cost income in 1/ family~d restaurant chosen kids rating distance cost income Freebirds MamasPizza CafeEccell LosNortenos WingsNmore Christophers MadCows Freebirds MamasPizza CafeEccell LosNortenos WingsNmore Christophers MadCows Freebirds MamasPizza CafeEccell LosNortenos WingsNmore Christophers MadCows

1 CafeEccell 0 1 2 4.21293 8.182085 39 4. 1 LosNortenos 0 1 3 4.167634 9.861741 39 5. 1 WingsNmore 0 1 2 6.330531 9.

11 . nlogitgen type=restaurant(fast: Freebirds MamasPizza, family: CafeEccell LosNortenos WingsNmore, fancy: Christophers MadCows) This returns: new variable type is generated with 3 groups label list lb_type lb_type: 1 fast 2 family 3 fancy. nlogittree restaurant type <-GIVES US THE TREE STRUCTURE. Type is the branch; restaurants are the "twigs." tree structure specified for the nested logit model top --> bottom type restaurant fast Freebirds MamasPizza family CafeEccell LosNorte~s WingsNmore fancy Christop~s MadCows

nlogittree restaurant type <-GIVES US THE TREE STRUCTURE. Type is the branch; restaurants are the "twigs.

12 \newpage. nlogit chosen (restaurant= cost rating distance) (type = incfast incfancy kidfast kidfancy), group(family_id) nolog Nested logit estimates Levels = 2 Number of obs = 2100 Dependent variable = chosen LR chi2(10) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] restaurant cost <-These are the alpha parms. rating distance type incfast <-WHY DO I HAVE THESE? incfancy <-These are the beta parms. kidfast kidfancy (incl. value parameters) type /fast <-These are the tau parms. /family /fancy LR test of homoskedasticity (iv = 1): chi2(3)= 9.90 Prob > chi2 =

LR chi2(10) = 199.6293 Log likelihood = -483.9584 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ Coef. Std. Err. z P> z [95% Conf.

13 For fun.. nlogit chosen (restaurant= cost rating distance) (type = incfast incfancy kidfast kidfancy), group(family_id) nolog ivc(fast=1, family=1, fancy=1) notree <---CONSTRAINING TAU TO 1 User-defined constraints: IV constraints: [fast]_cons = 1 [family]_cons = 1 [fancy]_cons = 1 Nested logit regression Levels = 2 Number of obs = 2100 Dependent variable = chosen LR chi2(7) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] restaurant cost rating distance type incfast incfancy kidfast kidfancy (incl. value parameters) type /fast /family /fancy

constraints: IV constraints: [fast]_cons = 1 [family]_cons = 1 [fancy]_cons = 1 Nested logit regression Levels = 2 Number of obs = 2100 Dependent variable = chosen LR chi2(7) = 189.

14 Constraining tau=1 should recover conditional logit:. clogit chosen cost rating dist incfast incfancy kidfast kidfancy, group(family_id) Conditional (fixed-effects) logistic regression Number of obs = 2100 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = chosen Coef. Std. Err. z P> z [95% Conf. Interval] cost rating distance incfast incfancy kidfast kidfancy (And it does; verify from previous slide)

0000 Log likelihood = -488.90834 Pseudo R2 = 0.1625 ------------------------------------------------------------------------------ chosen Coef. Std. Err. z P> z [95% Conf.

15 But since we know IIA doesn t hold, we should continue with unconstrained nested logit. Nested logit regression Levels = 2 Number of obs = 2100 Dependent variable = chosen LR chi2(10) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] restaurant cost rating distance type incfast incfancy kidfast kidfancy (incl. value parameters) type /fast /family /fancy LR test of homoskedasticity (iv = 1): chi2(3)= 9.90 Prob > chi2 =

Interval] -------------+---------------------------------------------------------------- restaurant cost -.0944352.03402-2.78 0.006 -.1611131 -.0277572 rating.1793759.126895 1.41 0.157 -.0693338.

16 Nested Logit: Illustration There are clearly many parameters here. Let s figure out what all of this means. I m going to make use of Stata s predict options to back out various quantities. Note, any of these quantities could be retrieved by hand using functions from above.

I m going to make use of Stata s predict options to back out various

17 Nested Logit: Illustration predict pb will return the probability of choosing restaurant j. predict p1, p1 will return the probability of branch i. predict condpb, condpb will return Pr j i. predict xbb, xbb will return the linear prediction for the bottom-level choice. predict xb1, xb1 will return the linear prediction for the top-level choice. predict ivb, ivb will return the inclusive value parameter.

predict xbb, xbb will return the linear prediction for the bottom-level choice.

18 . list family_id chosen pb p1 condpb restaurant type in 1/ family~d chosen pb p1 condpb restaurant type Freebirds fast MamasPizza fast CafeEccell family LosNortenos family WingsNmore family Christophers fancy MadCows fancy Freebirds fast MamasPizza fast CafeEccell family LosNortenos family WingsNmore family Christophers fancy MadCows fancy family~d chosen xbb xb1 ivb restaurant type Freebirds fast MamasPizza fast CafeEccell family LosNortenos family WingsNmore family Christophers fancy MadCows fancy Freebirds fast MamasPizza fast CafeEccell family

3802899 CafeEccell family 4. 1 0.284375.7266538.3913486 LosNortenos family 5. 1 0.1659397.7266538.2283615 WingsNmore family ---------------------------------------------------------------------------- 6.

19 LosNortenos family WingsNmore family Christophers fancy MadCows fancy

264743 Christophers fancy 14. 2 0-3.138791 1.570648-2.

20 Where do the numbers come from? xbb: Linear prediction for the bottom level It s a function of the covariates cost, rating, and distance. For the first observation, we see this is:. display _b[cost]*cost+_b[rating]*rating+_b[distance]*distance condpb: Conditional probability of restaurant j given branch i (from equation on previous slide):. display exp( )/(exp( )+exp( )) for "FreeBirds" and. display exp( )/(exp( )+exp( )) for "MamasPizza." xb1: Linear prediction for i branch This is the linear prediction for the top-level model (or the branches):. display *incFast *incFancy *kidFast *kidFancy (The parms are the alphas from the model output)

display exp(-.731619)/(exp(-.731619)+exp(-.8987747)).54169189 for "FreeBirds" and. display exp(-.8987747)/(exp(-.731619)+exp(-.8987747)).45830811 for "MamasPizza.

21 OK. Now what about the "inclusive value parameters." These parameters essentially give us the "weight" the chooser ascribes to each branch. Under conditional logit, this weight is assumed to be uniform and therefore, 1. We see in our model that these parameters are not jointly 1 (which provides evidence in favor of the nested logit model). Above, I refer to these parameters as the tau. The question at hand now is where do the I come from? For the first family in the data set, note the following:. display log(exp( )+exp( )) display log(exp( )+exp( )+exp( )) display log(exp( )+exp( )) What do the numbers represent? The numbers in parentheses are our linear predictions for the "bottom level" choices, that is, the "xbb." Note, then, what the inclusive value gives us: it gives us a summary of the weight accorded each "branch" that is available to the chooser.

22 Ok, almost done. Now what about the top-level probabilities (i.e. the probability of choosing fast food, family, or fancy?). In lecture, I give the function. To compute it directly, we do the following:. display exp( _b[/fast]* )/ (exp( _b[/fast]* ) + exp( exp(0 +_b[/family]* )) +_b[/fancy]* ) Note where these numbers come from: they are the taus, the "ivb," and the "xb1." In doing this exercise, we reproduce pb1. Interpretation? The probability of choosing a fast food restaurant is.15 for a person with this covariate profile.

23 Finally, we can compute the "bottom-level" probability. It is the simple conditional probability result. For the first observation, it is:. display p1*condpb We could then "fill in the tree" for observation 1 (if we wanted to).

24 Nested Logit: Illustration So what would we get from this model if we fully interpreted it? The probability of choice j. That is, the unconditional probability. The conditional probability of choice j given the selection of branch i. The probability of choosing branch i. A direct test of the weight associated with each branch, given chooser attributes. Seems a useful empirical model for testing rational choice predictions. Data requirements are substantial, as is theory for nesting choices.

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase