Advantages of latent class over continuous mixture of Logit models Stephane Hess Moshe Ben-Akiva Dinesh Gopinath Joan Walker May 16, 2011 Abstract This paper adds to a growing body of evidence highlighting the potential advantages of Latent Class Logit models over continuous mixture Logit models. In particular, we present formulae for correlation between coefficients and for elasticities, and show how these are a function of any sociodemographic attributes included in the class allocation model. An empirical analysis is then conducted which not only confirms these advantages in interpretation, but also shows that, even with a limited number of classes, the Latent Class Logit models achieves very similar model fit to that of its continuous mixture counterpart. 1 Introduction The recognition that there exist fundamental differences in preferences between individuals faced with the same choice tasks has been one of the cornerstones of work in the area of behavioural modelling. In a transportation context, the main emphasis in recent years has been on accommodating these variations through a random coefficients approach, with particular interest in continuous Logit mixture models. On the other hand, especially in the marketing literature, latent class approaches have come to dominate. Importantly, a number of past comparisons Institute for Transport Studies, University of Leeds, s.hess@its.leeds.ac.uk, Tel: +44 (0)113 34 36611, Fax: +44 (0)113 343 5334 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, mba@mit.edu Profit Engineering Group, dgopinath@alum.mit.edu Institute of Transportation Studies, University of California at Berkeley, Joan- Walker@Berkeley.Edu 1
between the two structures have highlighted the possible advantages of latent class structures, as for example in the work of Gopinath (1995), Greene and Hensher (2003) and Shen (2009). In the present paper, we extend on such past comparisons with a view to encouraging more widespread use of latent class choice models in a transportation context. Both types of models are based on the idea of using a mixture of a simple underlying model, typically Multinomial Logit, over the distribution of preferences. In the continuous Logit mixture model, this distribution is continuous, while in the latent class context, a finite number of classes are used to express the heterogeneity. The fact that latent class models rely on a limited number of support points in the distribution could arguably be seen as a shortcoming, and may be behind the slow uptake of this model in the field of transport research. Conversely, while continuous mixture models are very flexible (cf. McFadden and Train, 2000), there are also numerous pitfalls in addition to the well documented high computational costs. These relate primarily to the distributional assumptions (see e.g. Hensher and Greene, 2003; Hess et al., 2005; Fosgerau, 2006), and the implications thereof on the interpretation of results and the computation of important model outputs such as willingness-to-pay (WTP) indicators (cf. Daly et al., 2009). Finite mixture structures such as the latent class model are less affected by the computational burden and interpretation issues highlighted above for continuous mixtures. Independently of whether continuous or finite mixtures are used to accommodate the heterogeneity across respondents, there exists the possibility of linking this heterogeneity to covariates, i.e. moving away from a purely random treatment of the variations. In a continuous mixture context, this is done through expressing the parameters of the random distribution as a function of these covariates, while, in a latent class model, this is accommodated in the class allocation model. This paper provides some further insights into the differences between Latent Class Logit structures and continuous Logit mixture models, and especially the potential advantages of the former. We first derive equations for inter-coefficient correlation and elasticities, linking these measures directly to socio-demographic attributes used in the class allocation model. We then describe an application comparing Latent Class Logit models to Logit and continuous mixtures of Logit models. The remainder of this paper is organised as follows. Section 2 discusses modelling methodology, including an in-depth look at taste heterogeneity, intercoefficient correlation and elasticities in Logit, continuous mixture of Logit and Latent Class Logit models. Section 3 presents the results of an application on Stated Choice (SC) data for departure time and mode choice. Finally, Section 4 2
presents the conclusions of the research. 2 Methodology This section sets out the methodology used in the paper. We first discuss general methodology before looking at taste heterogeneity, correlation and elasticities. 2.1 Background methodology Let P n (i β) give the probability of respondent n choosing alternative i, conditional on a vector of taste coefficients β. In a Logit model, we have: P n (i β) = e V ni J, (1) j=1 ev nj where J is the total number of alternatives, and where the observed utility V ni is given by f (x ni, β), which is a function of the attributes of alternative i as faced by respondent n and the vector of taste coefficients β 1. In a continuous mixture model, the vector β follows a random distribution with parameters Ω, and the choice probabilities are given by: P n (i Ω) = P n (i β) f (β Ω) dβ, (2) β where P n (i β) once again gives the Logit choice probability from Equation 1 and where f (β Ω) gives the density function for the vector of taste coefficients β. In the case of multiple choices for each respondent, the assumption is generally made that the tastes vary across respondents but not across choices for the same respondent (cf. Revelt and Train 1998 and see Hess and Rose 2009 for a recent discussion of this issue), and the probability of the observed sequence of choices is used in the maximisation of the log-likelihood. This probability is given by: [ Tn ] L n (j n1,..., j ntn Ω) = P n (j nt β) f (β Ω) dβ, (3) β t=1 where j nt gives the alternative chosen by respondent n in choice situation t, with T n giving the total number of choices for respondent n. In a Latent Class Logit model, the heterogeneity in tastes across respondents is accommodated by making use of separate classes with different values for the 1 The inclusion of any alternative specific constants is not made explicit here. 3
vector of taste coefficients β. Specifically, in a Latent Class Logit model with S classes, we would have S instances of the vector β, say β 1 to β S, with a possibility of some of the elements in β staying constant across some of the classes. A Latent Class Logit model uses a probabilistic class allocation model, where respondent n belongs to class s with probability π ns, and where 0 π ns 1 s and S π ns = 1. Latent Class models are generally specified with an underlying Logit model, but can easily be adapted for more general underlying structures such as Nested Logit or Cross-Nested Logit. Let P n (i β s ) give the probability of respondent n choosing alternative i conditional on respondent n falling into class s. The unconditional (on s) choice probability for alternative i and respondent n is then given by: S P n (i β 1,..., β S ) = π ns P n (i β s ), (4) i.e. the weighted sum of choice probabilities across the S classes, with the class allocation probabilities being used as weights. Unlike with the continuous mixture model, no simulation is required in the estimation of Latent Class Logit models. This specification can easily be extended to a situation with multiple choices per respondent, where, when making the same assumption of intra-respondent homogeneity as in Equation 3, we obtain: ( S Tn ) L n (j n1,..., j ntn β 1... β S ) = π ns P n (j nt β s ). (5) In the most basic version of a Latent Class Logit model, the class allocation probabilities are constant across respondents such that π ns = π s n. This structure is often referred to as a discrete mixture model. The real flexibility however arises when the class allocation probabilities are not constant across respondents but when a class allocation model is used to link these probabilities to characteristics of the respondents. Typically, these characteristics would take the form of socio-demographic variables, such as income, age and employment status. With z n giving the concerned vector of characteristics for respondent n, and with the class allocation model taking on a Logit form, the probability of respondent n falling into class s would be given by: π ns = t=1 e δs+g(γs,zn) S l=1, eδ l+g(γ l,z n). (6) where δ s is a class-specific constant 2, γ s is a vector of parameters to be estimated and g ( ) gives the functional form of the utility function for the class allocation 2 In a discrete mixture model, only these constants would be estimated. 4
model. Here, a major difference arises between class allocation models and choice models. In a choice model, the attributes vary across alternatives while the estimated coefficients (with a few exceptions) stay constant across alternatives. In a class allocation model, the attributes normally stay constant across classes while the parameters vary across classes. This allows the model to probabilistically allocate respondents to different classes depending on their socio-demographic characteristics. For example, a situation where high income and low income respondents are allocated differently to two classes could be represented with a positive income coefficient for the first class and a negative income coefficient for the second class. Finally, it should also be said that it is possible to combine Latent Class Logit and continuous mixture structures, leading to latent class structures with some continuous elements, as for example done by Walker and Li (2006). 2.2 Taste heterogeneity Some major differences arise across the three model structures in terms of their treatment of taste heterogeneity. In a Logit model, any taste heterogeneity needs to be accommodated in a deterministic way by linking marginal utility coefficients to socio-demographic indicators. This can either be done by estimating separate coefficients for mutually exclusive subgroups of the sample population (e.g. trip purpose) or by continuous interaction between taste coefficients and socio-demographic attributes such as income or age. Although some taste heterogeneity may still be explained in a deterministic manner, the main characteristic of the continuous Logit mixture comes in its random representation of taste heterogeneity, with the vector β varying across respondents according to a pre-specified statistical distribution with estimated parameters. Here, an interesting development comes in the form of models linking the parameters of these distributions to socio-demographic indicators of the respondents (cf. Greene et al., 2006). In a Latent Class Logit model, the taste heterogeneity is accommodated as a mixture between a deterministic and a random approach. A probabilistic model is used to allocate respondents to the different classes that characterise different tastes in the sample. However, the class allocation is not purely random but is a function of socio-demographic characteristics of the respondents. Finally, unlike with continuous mixtures, no a priori assumptions are made about the shape of the distribution of tastes other than the number of support points which is equal to the number of classes. In the simple discrete mixture case, the class allocation is not linked to socio-demographic information, bringing the model 5
closer to a continuous mixture model, but the number of classes is still fixed and no assumptions are made about the shape of the distribution of tastes across classes. 2.3 Correlation between taste coefficients Further important differences arise between the three models when it comes to correlation between taste coefficients. In the Logit model, such correlation only arises in the case where taste coefficients interact with socio-demographic attributes and specifically where multiple taste coefficients interact with the same socio-demographic characteristics. As an example, one could imagine a situation where cost sensitivity decreases with income while time sensitivity increases with income, resulting in negative correlation between the time and cost coefficients across the sample. While the correlation can thus be linked to socio-demographics, it should be said that in the majority of Logit applications, the coefficients will be distributed independently across the sample. In a continuous mixture model, correlation can be accommodated by specifying a joint distribution for the taste coefficients. While most estimation packages allow users to specify multivariate Normal distributions, the vast majority of continuous mixture applications make use of independently distributed taste coefficients. Correlation is rarely introduced in models not based on the Normal distribution, one exception being given in Walker (2001). In a Latent Class Logit model, correlation between taste coefficients is an inherent characteristic of the model structure. Let us assume that our model has S classes. Let us further assume that across all alternatives, our model makes use of P attributes, such that each vector of taste coefficients β s similarly contains P individual coefficients. Combining these vectors across the S classes, we obtain a (P xs) matrix of taste coefficients given by: β = β 1,1 β 1,2... β 1,S β 2,1 β 2,2... β 2,S.... β P,1 β P,2... β P,S, (7) where each row corresponds to one marginal utility coefficient and each column corresponds to one class. From this, it can be seen that in a Latent Class Logit model, there is likely to be correlation between two coefficients as long as both coefficients take on more than one value across the S classes. In addition to the actual taste coefficients shown in Equation 7, a Latent Class Logit model is also characterised by an additional vector giving the class 6
allocation probabilities. This vector is respondent specific, with: π n = [π n1,..., π ns ], (8) and we have that: P (β n1 = β 1,s β n2 = β 2,s... β np = β P,s ) = π ns, (9) with β np giving the value for the p th marginal utility coefficient for respondent n. From this, it can be seen that the correlation between taste coefficients in a Latent Class Logit model is a function of the class allocation probabilities as well as the values of the individual taste coefficients. Indeed, we have that: cov (β n1, β n2 ) = E [(β n1 E (β n1 )) (β n2 E (β n2 ))] = E (β n1 β n2 ) E (β n1 ) E (β n2 ) ( S S ) ( S ) = π ns β 1,s β 2,s π ns β 1,s π ns β 2,s (10) For ease of notation, let α = β 1 and γ = β 2 in which case Equation 10 can be written as: ( S S ) ( S ) cov (α n, γ n ) = π ns α s γ s π ns α s π ns γ s (11) A special situation arises when S = 2, in which case the class allocation probabilities have no effect on the sign of the correlation. Indeed, with the notation from Equation 11, we then have: cov (α n, γ n ) = π n1 π n2 [α 1 (γ 1 γ 2 ) + α 2 (γ 2 γ 1 )] = π n1 π n2 [(α 1 α 2 ) (γ 1 γ 2 )], (12) where the sign of cov (α n, γ n ) only depends on the changes in the two elements in α and γ across the two classes. This discussion has already shown that in a Latent Class Logit model, the correlation between coefficients is implicit in the model structure, and that, unlike with continuous mixtures, no additional adaptation of the specification (such as the form of the distribution) is required to accommodate it. However, another important distinction has to be made. If a multivariate distribution is used in a continuous mixture model, then the correlation between two coefficients is constant across respondents, unless the cross-diagonal terms in the covariance matrix are themselves parameterised. However, it can be seen from Equation 10 7
that in a Latent Class Logit model, the covariance (and hence the correlation) between two coefficients depends on the class allocation probabilities. Except in the case of a Latent Class Logit model with purely random class allocation (i.e. a discrete mixture model), the correlation itself thus varies across respondents as a function of the socio-demographic attributes used in the class allocation probabilities. 2.4 Elasticities As a final step, we now look at the elasticities in the different models. The Logit elasticities are well known (see e.g. Ben-Akiva and Lerman 1985), with the direct elasticity given by: E i,xni = V ni x ni x ni (1 P n (i β)), (13) while the cross-elasticity is given by: E i,xnj = V nj x nj x nj P n (j β), (14) exhibiting the IIA characteristic. In the continuous Logit mixture, the direct elasticity (see e.g. Train 2009) is given by: V ni β x E i,xni = ni x ni (1 P n (i β)) P n (i β) f (β Ω) dβ β P, (15) n (i β) f (β Ω) dβ with the cross-elasticity being: E i,xnj = V nj β x nj x nj P n (j β) P n (i β) f (β Ω) dβ β P, (16) n (i β) f (β Ω) dβ where this varies across alternatives, such that the continuous Logit mixture does not exhibit the IIA property. Here, it can be seen that the elasticities are given by an integration of Logit elasticities. We will now derive the elasticities for the Latent Class Logit model. Starting 8
with the direct elasticity, we have: x ni E i,xni = P n (i β) x ni P n (i β) ( S ) P n (i β s ) = π ns x ni ( S = = S x ni P n (i β) π ns V nis x ni P n (i β s ) (1 P n (i β s )) π ns P n (i β s ) P n (i β) ) x ni P n (i β) [ ] Vnis x ni (1 P n (i β s )). (17) x ni It can be seen that the term in square brackets corresponds to a Logit direct elasticity for a specific class in our Latent Class Logit model. This means that the direct elasticities in a Latent Class Logit model are a weighted sum of Logit elasticities, with the weights being given by multiplying the class membership probability with the class specific conditional probability and by dividing this product by the marginal probability. It can similarly be seen that the Latent Class Logit cross-elasticities are given by a weighted sum of Logit cross-elasticities. Specifically, we have: x nj E i,xnj = P n (i β) x nj P n (i β) S ( = π ns V ) njs x nj P n (i β s ) P n (j β s ) x nj P n (i β) = S π ns P n (i β s ) P n (i β) [ V ] njs x nj P n (j β s ). (18) x nj Two main observations can be made. Firstly, the similarity with the continuous Logit mixture elasticities is clearly visible. However, weighted summation replaces integration, such that no simulation is required. Like in the continuous mixture model, the cross-elasticities vary across alternatives, such that the model does not exhibit the IIA assumption. The second observation relates to variations across individuals. The Logit and continuous Logit mixture elasticities can be seen to vary across respondents due to differences in the attribute levels of the alternatives and hence also probabilities. Additionally, any socio-demographic interactions will lead to further variations. However, the relationship between the socio-demographic attributes and the elasticities cannot always be easily determined, especially in the continuous mixture model. In the Latent Class Logit 9
model on the other hand, the elasticities depend directly on the class allocation probabilities and as such are also a function of any socio-demographic attributes that enter into the class allocation model. 3 Empirical application This section presents an empirical application that illustrates the theoretical points discussed in Section 2. We first look at data and model specification before discussing the main estimation results. Finally, we look in turn at taste heterogeneity and inter-coefficient correlation, where, with the use of unlabelled SP data, no discussion on elasticities is included. 3.1 Data Our analysis makes use of Stated Choice (SC) data collected for the DATIV study carried out in Denmark in 2004 (cf. Burge and Rohr, 2004). For this survey, a binomial unlabelled route choice experiment was used, with two attributes, namely travel time (TT) and travel cost (TC), describing the alternatives. The final sample used in our analysis makes use of 1, 919 observations collected from 241 commuters, with up to 8 choice situations per respondent. In the analysis presented in this paper, a number of socio-demographic variables were used as covariates in the specification of taste heterogeneity, namely age, gender, personal income, and the ability to regularly work from home. Attempts to make use of other socio-demographic attributes, notably working time flexibility, were not successful. 3.2 Model specification In the specification of the underlying utility function, an alternative specific constant (ASC) was included for the first alternative, with a view to capturing left to right reading effects. The two main marginal utility coefficients are β TT and β TC, representing the marginal utility of changes in travel time and travel cost respectively. A number of additional offset parameters were included to estimate deviations from these sensitivities in certain socio-demographic sectors. Here, β TC, low inc. and β TC, high inc. represent deviations in the cost sensitivity for low income (less than DKK300, 000) and high income (more than DKK700, 000) respondents, and β TC, homeworking captures changes in travel cost sensitivity for respondents who regularly work from home. Finally, attempts were made to capture age effects, where the only significant change was observed using a piecewise linear interaction between age and travel time sensitivity, with constant 10
sensitivity for respondents below 40 years of age, gradually changing sensitivity for respondents aged between 40 and 60, and once again constant sensitivity for respondents over 60 years of age. The interaction in this middle age group is represented by β TT, age pl. group 2 where the estimate represents the change in sensitivity between a respondent of 40 years of age and a respondent of 60 years of age. No effects were observed for gender in this specification, nor were the time sensitivities different for respondents who regularly work from home. Moving away from the simple Logit model with deterministic taste heterogeneity only, a continuous Logit mixture was estimated, allowing for random heterogeneity in the two main coefficients, β TT and β TC, and recognising the repeated choice nature of the data through using a Revelt and Train (1998) style specification of the log-likelihood function, i.e. carrying out the integration/simulation at the level of a respondent rather than individual choice observation. Here, the best performance was obtained by making use of a multivariate Lognormal distribution, where additional offset parameters were estimated to allow for a bound different from zero. Specifically, the following specification was used: β TC = a TC + e µ TC+σ 11 ξ 1 (19) β TT = a TT + e µ TT+σ 21 ξ 1 +σ 22 ξ 2, (20) where a TC and a TT represent the offset parameters, µ TC and µ TT represent the means of the underlying Normal distribution, σ 11, σ 21 and σ 22 represent the terms of the Cholesky matrix, and ξ 1 and ξ 2 are two standard Normal variates. The sign change on β TC and β TT is required due to the positive domain of the Lognormal distribution. The third structure estimated on the data was a Latent Class Logit model. In this model, δ 1, β TT, age pl. group 2, β TC, low inc., β TC, high inc., and β TC, homeworking were kept as class independent, and only β TT and β TC were allowed to vary across classes, much as in the continuous Logit mixture. In the class allocation model, a total of seven parameters were included for each class. Along with an intercept (δ), coefficients were included for female respondents (β female ) and respondents who regularly work from home (β homeworking ). Age effects were once again captured through a piecewise linear specification, with three segments, namely respondents aged between 23 and 40 (β age pl. group 1 ), respondents aged between 40 and 60 (β age pl. group 2 ), and respondents aged between 60 and 73 (β age pl. group 3 ). Finally, income was used as an explanator, where a linear specification was found to offer the best performance, with the coefficient β income interacting with the income in DKK100, 000s. A separate analysis showed that the optimal number of classes for the Latent Class Logit model in this context is 3. 11
3.3 Estimation results This section presents the estimation results for the different models calibrated on our data. 3.3.1 Main estimation results The detailed estimation results are reported in Table 1. To account for the repeated choice nature of the data in the computation of the standard errors, the panel specification of the sandwich matrix was used across all models (cf. Daly and Hess, 2010). Looking first at model fit, we can see that the continuous Logit mixture and Latent Class Logit models both easily outperform the Logit model, with highly significant increases in log-likelihood (LL) by 74.54 and 83.14 units respectively, coming at the cost of 5 respectively 18 additional parameters. While the Latent Class Logit model produces the best LL of the three models, the higher number of parameters when compared to the continuous Logit mixture gives it a marginally lower performance according to the adjusted ρ 2 measure. Overall, the differences in fit between the two models are very small. Turning next to the detailed estimation results, the Logit estimates show the expected negative marginal utilities for travel time and travel cost increases, where the cost sensitivity is higher in the low income group (coefficient significant at the 83% level) while it is lower in the high income group. The sensitivity to travel cost is also lower for respondents who regularly work from home (coefficient significant at the 76% level) while between the age of 40 and the age of 60, the marginal utility of travel time increases drops by around 0.0038 for each additional year in age. Finally, the estimate of the alternative specific constant is positive and significant, possibly suggesting the presence of left to right reading effects. In the estimates for the continuous Logit mixture, we can observe a drop in the significance of the interaction terms with the exception of β TC, homeworking. This arguably signals that accounting for random taste heterogeneity reduces the scope for deterministic heterogeneity in this model. The positive estimates for a TT and a TC do, in conjunction with the sign change, imply universally negative values for the travel time and travel cost coefficients, even when factoring in the interactions with the positive β TT, age pl. group 2, β TC, high inc., and β TC, homeworking parameters. In the Latent Class Logit model, we also observe reductions in the significance levels for interaction terms as witnessed in the continuous Logit mixture. Additionally, while the estimates for β TT and β TC are negative in all three classes (and remain so even when interacted with the positive β TT, age pl. group 2, β TC, high inc., and β TC, homeworking terms), problems with significance are observed in the third 12
Table 1: Detailed estimation results continuous Logit Latent Class Logit mixture model Logit model Observations 1,919 1,919 1,919 Respondents 241 241 241 Final LL -1,153.12-1,078.57-1,069.97 par. 7 12 25 adj. ρ 2 0.1278 0.1801 0.1768 Class indep Class 1 Class 2 Class 3 est. t-rat. est. t-rat. est. t-rat. est. t-rat. est. t-rat. est. t-rat. Wald(=) p-value δ1 0.158 3.27 0.22 3.46 0.1905 3.25 - - - - - - - - βtt, age pl. group 2 0.0755 2.93 0.0697 1.81 0.0732 2.27 - - - - - - - - βtc, low inc. -0.0382-1.38-0.02-0.73 0.0036 0.14 - - - - - - - - βtc, high inc. 0.091 3.36 0.11 3.14 0.0961 3.89 - - - - - - - - βtc, homeworking 0.0281 1.17 0.0437 1.46 0.031 1.35 - - - - - - - - att - - 0.0356 0.77 - - - - - - - - - - βtt -0.19-10.22 - - - - -0.2414-7.19-0.6964-4.30-0.1723-0.65 8.6923 0.01 µtt - - -1.39-7.17 - - - - - - - - - - atc - - 0.158 5.95 - - - - - - - - - - βtc -0.221-9.65 - - - - -0.2285-8.14-2.0474-3.70-0.6143-0.94 13.0029 0.00 µtc - - -2.17-9.88 - - - - - - - - - - σ11 - - 2.63 7.96 - - - - - - - - - - σ21 - - 1.70 5.31 - - - - - - - - - - σ22 - - 0.66 11.61 - - - - - - - - - - Class allocation model est. t-rat. est. t-rat. est. t-rat. Wald(=) p-value δ 0.5862 0.45-5.8349-3.88 5.2487 3.89 18.96 0.00 βfemale 0.4356 1.16 1.1206 2.42-1.5563-2.22 5.92 0.05 βhomeworking -0.4396-1.21-1.0674-1.95 1.5071 2.18 4.77 0.09 βage pl. group 1-0.8665-0.98 3.094 1.77-2.2274-1.44 3.18 0.20 βage pl. group 2 0.4282 0.60 1.772 2.22-2.2003-1.82 4.94 0.08 βage pl. group 3-0.3655-0.09-9.1448-1.25 9.5103 2.18 6.73 0.04 βincome 0.3923 1.87 0.5807 3.15-0.973-2.60 11.14 0.00 13
class. Indeed, the estimate for β TT is only significant around the 50% level, with a 65% level applying for the estimate of β TC. This result could in part be explained by earlier observations of some respondents who are largely indifferent between the two alternatives in this dataset (cf. Hess et al., 2010). In terms of differences across the three classes, the Wald test shows significant variations in both coefficients. The estimates for the various parameters used in the class allocation model show high degrees of variation in coefficient values across the three classes. While in the first of the three classes, only the income parameter attains a high level of statistical significance, the majority of parameters are significant for the remaining two classes. 3.3.2 Heterogeneity in the Latent Class Logit model Three different classes were identified in the Latent Class Logit model. Disregarding for the moment the presence of the additional interaction terms, the first class shows a valuation of travel time savings (VTTS) of 63.39DKK/hr, where this drops to 20.41DKK/hr in the second class, and 16.83DKK/hr in the third class, where problems with parameter significance were also observed. Alongside the differences in relative valuations across the three classes, we can also observe differences in absolute coefficient values, with visibly higher scale in the second class. These differences in scale across the classes are consistent with earlier observations of very substantial scale differences in this dataset by Hess et al. (2009). On the basis of the estimated class allocation probabilities for each respondent, and the values of the relevant explanatory variables, it is possible to work out an expected value for each of the six variables in each of three classes. As an example, the most likely value for the female dummy in class 1 would be obtained as: female class 1 = N n=1 (π n1 female n ) N n=1 π, (21) n1 where N is the number of respondents, female n is 1 if respondent n is female and 0 otherwise, and π n1 is the probability of respondent n falling into class 1, computed on the basis of the class allocation model. The results of this process are shown in Table 2, where in addition, the expected values for the piecewise linear terms were used to compute an expected age, and where the table also shows the VTTS in the different classes. From the results in Table 2, we can see that the low VTTS in the third class can most easily be linked to lower expected income. The expected income in the 14
Table 2: Expected values for explanatory variables in latent classes class 1 class 2 class 3 female 0.399 0.506 0.241 homeworking 0.279 0.211 0.342 age pl. group 1 0.867 0.980 0.657 age pl. group 2 0.352 0.539 0.163 age pl. group 3 0.017 0.008 0.054 income 4.179 4.712 3.199 age 45.018 50.540 38.125 VTTS (DKK/hr) 63.39 20.41 16.83 remaining two classes is higher, and although it is highest in the second class, the VTTS is much higher in the first class, where this can possibly be linked to a lower average age, a higher rate of regularly working from home (often linked to higher time sensitivity) and a lower representation of female respondents. It is also worth remembering that class 2, which captures more female respondents, as well as highly paid and slightly older respondents, shows the highest scale in Table 1. 3.3.3 Comparison of heterogeneity across models As a next step, we compare the retrieved heterogeneity patterns across the three models. We first look at how the three models represent the variation in the VTTS across different socio-economic subgroups. Table 3 takes 18 individuals, differentiated by age (three different ages), income (three different groups), and whether they regularly work from home (reg h-w). With gender only being used in the Latent Class Logit models, we use a purely male sample of respondents in this initial comparison. For the Logit models, the point value is computed on the basis of the main coefficients and the socio-demographic interaction coefficients. For the continuous Logit mixture, we have the additional continuous random component, while, in the Latent Class Logit model, we have a distribution across the three classes for each respondent. Hence, for the continuous Logit mixture and Latent Class Logit models, the standard deviation (for the given type of respondent) is presented alongside the mean. The mean values in the continuous Logit mixture model are higher across all 18 respondents when compared to the Logit point values. With only four exceptions (namely the oldest respondents in the high income group, and the medium and high income young respondents who do not regularly work from home), the Latent Class Logit mean values are lower than the continuous Logit 15
mixture values, and are evenly distributed around the Logit values (in terms of increases and decreases). Finally, without a single exception, the degree of heterogeneity for specific types of individual are lower with the Latent Class Logit model than with the continuous Logit mixture model. This is attributable in part to the finite mixture approach in the Latent Class Logit model as well as the use of the Lognormal distribution in the distribution in the continuous Logit mixture model. In the Logit and continuous Logit mixture models, the VTTS universally decreases with age, while this is not necessarily the case in the Latent Class Logit models, a result of the fact that greater heterogeneity in relation to age is accommodated through the inclusion of age as an explanator in the class allocation model. As expected, the VTTS increases with income across all models. Finally, while the VTTS is universally higher for respondents who regularly work from home in the Logit and continuous Logit mixture models, this is only the case for high income respondents in the Latent Class Logit models (and medium income respondents in the middle age group), where this distinction is once again a result of also incorporating this attribute in the class allocation model. So far, we have focussed solely on certain representative individuals, while it is clearly also of great interest to look at the sample level VTTS distribution. For the Logit model, this equates to working out the point value for each individual and looking at the distribution of these values across the sample. For the continuous Logit mixture and Latent Class Logit models, it is however important to additionally incorporate the respondent-level uncertainty in the calculation of the VTTS. In practical terms, for the Latent Class Logit model, a population level distribution is obtained by taking the three respondent-specific VTTS measures for each individual (i.e. for the three classes), and combining this into a set of values across the sample, with weights for each value given by dividing the individual-specific class allocation weights by the number of respondents. For the continuous Logit mixture case, a continuous analogue to this approach was employed, making use of 100, 000 random draws for each of the 241 respondents. The results of this process are summarised in Table 4, showing that while the mean VTTS is very similar in the Logit and Latent Class Logit models, the additional random component of the latter model leads to a greater degree of heterogeneity. For the continuous Logit mixture model, both the mean and especially the standard deviation are higher than in the Logit and Latent Class Logit models, where this is at least in part due to the long tail of the Lognormal distribution. 16
Table 3: Comparison of heterogeneity in VTTS across models and sociodemographic groups continuous Latent Class Logit mixture Logit gender age income reg h-w Logit mean s.dev. mean s.dev. male 23-40 (31.5) low no 43.98 51.69 41.85 27.46 19.71 male 23-40 (31.5) low yes 49.33 58.58 46.08 20.06 11.04 male 23-40 (31.5) medium no 51.58 54.83 43.78 59.71 12.39 male 23-40 (31.5) medium yes 59.10 63.14 48.88 56.82 25.38 male 23-40 (31.5) high no 87.69 94.52 69.03 104.07 20.99 male 23-40 (31.5) high yes 111.87 185.56 152.33 138.63 22.18 male 40-60 (50) low no 35.24 46.05 42.71 38.21 19.45 male 40-60 (50) low yes 39.53 51.73 46.98 25.27 20.50 male 40-60 (50) medium no 41.33 48.65 44.67 43.27 15.90 male 40-60 (50) medium yes 47.36 55.43 49.78 52.55 18.17 male 40-60 (50) high no 70.27 79.78 68.69 58.23 36.22 male 40-60 (50) high yes 89.65 140.30 125.93 88.27 47.19 male 60-73 (66.5) low no 26.50 38.54 44.27 11.61 7.88 male 60-73 (66.5) low yes 29.73 42.59 48.83 10.59 3.71 male 60-73 (66.5) medium no 31.09 40.42 46.36 36.13 14.52 male 60-73 (66.5) medium yes 35.61 45.15 51.82 23.48 19.14 male 60-73 (66.5) high no 52.85 60.12 71.87 74.91 8.59 male 60-73 (66.5) high yes 67.42 79.96 132.08 97.84 11.72 3.3.4 Correlation between travel time and travel cost coefficients In the Logit model, we have point estimates for the travel time and travel cost coefficients, and while these point estimates themselves may be correlated, there is no distribution of coefficients. In the continuous Logit mixture model, the two coefficients follow a multivariate random distribution, and while the mean in these distributions varies across respondents (through the incorporation of the interaction terms), the standard deviation stays constant, as does the correlation, which derives solely from the Cholesky transformation of the underlying multivariate Normal distribution. In the Latent Class Logit model, the actual distribution of the random component varies across individuals as the weights for the different classes are not constant across respondents. As a result, and using the formuale from Section 2.3, we can work out individual specific correlations, where this process is illustrated in Table 5 for ten representative individuals with different socio-demographic characteristics. Here, we note five respondents with negative correlation between 17
Table 4: Sample population level heterogeneity in VTTS measures mean std.dev. Logit 46.52 14.40 continuous Logit mixture 58.78 44.04 Latent Class Logit 48.51 25.82 Table 5: Correlation between travel time and travel cost coefficients in Latent Class Logit model for ten representative individuals Class 1 Class 2 Class 3 age gender reg. w-h inc.gr. π β TT β TC π β TT β TC π β TT β TC corr 30 male no 1 0.06-0.2414-0.2249 0.00-0.6964-2.0438 0.94-0.1723-0.6107-0.51 30 male no 5 0.92-0.2414-0.2285 0.02-0.6964-2.0474 0.06-0.1723-0.6143 0.82 30 male no 11 0.94-0.2414-0.1324 0.06-0.6964-1.9513 0.00-0.1723-0.5182 1.00 30 male yes 3 0.12-0.2414-0.1939 0.00-0.6964-2.0128 0.87-0.1723-0.5797-0.57 30 female no 5 0.95-0.2414-0.2285 0.04-0.6964-2.0474 0.01-0.1723-0.6143 0.99 30 female yes 3 0.51-0.2414-0.1939 0.01-0.6964-2.0128 0.49-0.1723-0.5797-0.03 50 male no 5 0.70-0.2048-0.2285 0.30-0.6598-2.0474 0.01-0.1357-0.6143 1.00 50 male yes 5 0.78-0.2048-0.1975 0.18-0.6598-2.0164 0.04-0.1357-0.5833 0.98 70 male no 5 0.19-0.1682-0.2285 0.00-0.6232-2.0474 0.81-0.0991-0.6143-0.93 70 male yes 1 0.00-0.1682-0.1939 0.00-0.6232-2.0128 1.00-0.0991-0.5797-0.98 the two coefficients, with positive correlation for the remaining five respondents. The expectation would clearly be for negative correlation between time and cost sensitivities, but the presence of significant levels of scale heterogeneity can lead to positive correlation between the distribution of the individual coefficients. A study of the results in Table 5, along with a detailed analysis of the sample population results 3 reveals a number of relationships. Firstly, correlation is higher for female respondents, while it is lower for respondents who regularly work from home. Correlation increases with age up to 60 years, when it starts decreasing. Finally, correlation rises with income. As a final step in our analysis of the correlation patterns, we look at the distribution at the sample population level, with results reported in Table 6. These results were obtained in the same way as the sample population level distribution statistics in Table 4, and show positive correlation between the travel time and travel cost coefficient in all three models, consistent with the earlier comment about high degrees of scale heterogeneity in the DATIV data. The correlation is highest in the Latent Class Logit model, most likely as a result of 3 Detailed results available on request. 18
Table 6: Correlation between travel time and travel cost coefficient at sample population level corr (β TT, β TT ) Logit 0.22 continuous Logit mixture 0.60 Latent Class Logit 0.95 the major scale differences between class 2 and the remaining classes. 4 Summary and conclusions This paper has presented a comparison between the commonly used continuous Logit mixture model and the more rarely used Latent Class approach in their respective approaches to dealing with heterogeneity across respondents. The paper has first presented formulae for the inter-coefficient correlation and elasticities in Latent Class Logit models and has shown how these measures are a function of the socio-demographic attributes used in the class allocation model. Our empirical example has then further illustrated the differences between the two types of models, making use of stated choice data take from a value of time study. The results from the empirical application show that both the continuous Logit mixture and Latent Class Logit model produce significant gains in performance when compared to the Logit model by allowing for random taste heterogeneity on top of the already incorporated deterministic variations. The actual fit of the two advanced models is comparable, but significant differences arise between the models in terms of substantive results. Firstly, it is clear that the Latent Class Logit models are able to retrieve richer patterns of heterogeneity by linking the class allocation to socio-demographic indicators. This allows this model to move away from some of the monotonic interactions seen in the Logit and continuous Logit mixture models, such as a strict decreasing relationship between age and VTTS. Incorporating such patterns in the continuous Logit mixture models would require a parameterisation of the variance of the random distributions (cf. Greene et al., 2006), which however leads to additional difficulties in estimation. Secondly, and crucially in the context of the present paper, the analysis has shown how the heterogeneity in VTTS measures and the correlation between taste coefficients can be easily linked to socio-demographic characteristics in the Latent Class Logit model. Also, while both the continuous Logit mixture and Latent Class Logit models allow for uncertainty in the distribution of tastes for individual respondents, only the Latent Class Logit model allows for additional 19
variation in the correlation across respondents. The results in this paper provide an illustration of the potential benefits of Latent Class Logit models for applied research in the area of travel behaviour. It remains up to the analyst to make an informed choice between continuous Logit mixture and Latent Class Logit models on a case by case basis, but the Latent Class Logit model should at the very least be regarded as a viable alternative to the continuous Logit mixture model. Further work should be conducted using other datasets, especially with a view to the computation of elasticities. Finally, it is worth adding a brief note on the specification of latent class models. The form of the model we use here is particularly accessible as there are well-established estimation software programs to estimate such models. However, its major disadvantage is that it does not permit one to impose different a-priori restrictions on the specifications of the class membership models and on the class specific choice probabilities. Such restrictions would be needed if the latent classes were based on a priori behavioural hypotheses. For this reason, the model specification process we followed was exploratory in that we allowed the number of classes and the structure of the classes to be inferred from the data. A stronger case for the latent class model could be made by using a confirmatory approach in which the classes and their socio-economic covariates were based on behavioural theory. For an example of such confirmatory approach see Gopinath (1995), where the formulae presented in this paper would also apply to such a model. Further, the causal factors for the latent classes could include other latent factors (such as attitudes) that should be explicitly captured in the model specification. In order to develop such a complex model, additional measurement indicators would likely be needed resulting in a more complex model with multiple equations. For an example, see Walker and Ben-Akiva (2002). Nonetheless, the model as presented here does provide evidence for improved statistical fit, easier interpretation, and greater policy relevance. Acknowledgements This paper is partly based on work conducted during a stay as a visiting research scholar by the first author in the Department of Civil & Environmental Engineering at the Massachusetts Institute of Technology. The first author also acknowledges the support by the Leverhulme Trust, in the form of a Leverhulme Early Career Fellowship. 20
References Ben-Akiva, M., Lerman, S. R., 1985. Discrete Choice Analysis: Theory and Application to Travel Demand. MIT Press, Cambridge, MA. Burge, P., Rohr, C., 2004. DATIV: SP Design: Proposed approach for pilot survey. Tetra-Plan in cooperation with RAND Europe and Gallup A/S. Daly, A., Hess, S., 2010. Simple methods for panel data analysis. paper presented at the 12 th World Conference on Transport Research, Lisbon, Portugal. Daly, A., Hess, S., Train, K., 2009. Assuring finite moments for willingness to pay in random coefficient models. paper presented at the European Transport Conference, Noordwijkerhout. Fosgerau, M., 2006. Investigating the distribution of the value of travel time savings. Transportation Research Part B 40 (8), 688 707. Gopinath, D., 1995. Modeling heterogeneity in discrete choice processes: Application to travel demand. Ph.D. thesis, MIT, Cambridge, MA. Greene, W. H., Hensher, D. A., 2003. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B 37 (8), 681 698. Greene, W. H., Hensher, D. A., Rose, J. M., 2006. Accounting for heterogeneity in the variance of unobserved effects in mixed logit models. Transportation Research Part B 40 (1), 75 92. Hensher, D. A., Greene, W. H., 2003. The Mixed Logit Model: The State of Practice. Transportation 30 (2), 133 176. Hess, S., Bierlaire, M., Polak, J. W., 2005. Estimation of value of travel-time savings using mixed logit models. Transportation Research Part A 39 (2-3), 221 236. Hess, S., Rose, J. M., 2009. Allowing for intra-respondent variations in coefficients estimated on repeated choice data. Transportation Research Part B 43 (6), 708 719. Hess, S., Rose, J. M., Bain, S., 2009. Random scale heterogeneity in discrete choice models. ITS working paper. Institute for Transport Studies, University of Leeds. 21
Hess, S., Rose, J. M., Polak, J. W., 2010. Non-trading, lexicographic and inconsistent behaviour in sp choice data. Transportation Research Part D 15 (7), 405 417. McFadden, D., Train, K., 2000. Mixed MNL Models for discrete response. Journal of Applied Econometrics 15 (5), 447 470. Revelt, D., Train, K., 1998. Mixed Logit with repeated choices: households choices of appliance efficiency level. Review of Economics and Statistics 80 (4), 647 657. Shen, J., 2009. Latent class model or mixed logit model? a comparison by transport mode choice data. Applied Economics 41 (22), 2915 2924. Train, K., 2009. Discrete Choice Methods with Simulation, second edition Edition. Cambridge University Press, Cambridge, MA. Walker, J., 2001. Extended discrete choice models: Integrated framework, flexible error structures, and latent variables. Ph.D. thesis, MIT, Cambridge, MA. Walker, J., Ben-Akiva, M., 2002. Generalized random utility model. Mathematical Social Sciences 43 (3), 303 343. Walker, J., Li, J., 2006. Latent lifestyle preferences and household location decisions. Journal of Geographical Systems 9 (1), 77 101. 22