Comparing the Latent Class Model with the Random Parameters. Logit - A Choice Experiment analysis of highly heterogeneous
|
|
- Abner Watson
- 8 years ago
- Views:
Transcription
1 Comparing the Latent Class Model with the Random Parameters Logit - A Choice Experiment analysis of highly heterogeneous electricity consumers in Hyderabad, India Julian Sagebiel Department for Agricultural Economics, Humboldt-Universität zu Berlin julian.sagebiel@hu-berlin.de Abstract The increased application of the Stated Choice methods led researchers to develop several econometric models that relax the strict assumptions of the frequently applied Conditional Logit model. Especially the question of how to incorporate preference heterogeneity into the analysis is subject to current research. This paper contributes to the discussion by comparing two of the most commonly used models, the Latent Class Logit model and the Random Parameters Logit model. Both models have in common that they introduce heterogeneity in the systematic part of utility but differ in their assumptions of the distribution of preferences. For comparison, data from a choice experiment on electricity quality in India will be analyzed. Thereby, measures of fit, willingness to pay values and choice probabilities of both models will be contrasted. Apart from the statistical comparison, I discuss further issues that contribute to an adequate choice of the model.
2 Contents 1 Introduction 2 2 Choice Models The Random Parameters Logit Model The Latent Class Model Data 8 4 Results Random Parameters Logit Estimation Latent Class Estimation Model Comparison Measures of Fit Conditional WTP Values Choice Probabilities Discussion 24 7 Conclusion 29 1
3 1 Introduction The steady increase in applications of discrete choice models has led to a variety of econometric estimation techniques. The conditional logit model (CL), made popular by [McFadden, 1974], provides an easy-to-handle estimation process but is limited to several assumptions. While the CL can model heterogeneity by incorporating interaction terms of case-specific variables with alternative-specific constants or alternative-specific variables (attributes), other models do the same in a more sophisticated way. Worth mentioning are the random parameters logit model (RPL) and the latent class model (LCM). The RPL is characterized by accommodating heterogeneity as a continuous function of the parameters i.e. the parameters are random underlying some ex-ante specified distribution. In contrast, the LCM can be interpreted as a semi-parametric version of the RPL, which derives heterogeneity from different classes, each having its own parameters. As [Greene and Hensher, 2003] point out, the RPL is more flexible as it can induce nearly any behavioral assumption in terms of preference distribution, while the LCM benefits from its semi-parametric structure which does not require any assumption on the distribution of the parameters. The literature provides rather ambiguous results from a comparison of the models. It is hardly found that one model clearly outperforms the other by statistical performance. In this paper, I will briefly discuss the two models and compare results from choice experiment (CE) data generated to elicit willingness to 2
4 pay (WTP) measures for several attributes of electricity quality. The data is derived from a household survey conducted in February 2010 in Hyderabad, India. It covers 800 private households which are stratified into slum, middle class and high class. While income and electricity consumption as well as lifestyles vary significantly over the classes, household characteristics appear homogeneous within the classes as often witnessed in India. From this observation, my interest in the LCM arises. Regardless of the statistical fit, the LCM appears to describe the context better than the RPL. This is, the LCM could model three classes which are hypothesized to differ significantly in preferences but have homogeneous within-class preferences. If the above mentioned assumption is true, a three class model is most suitable. It remains the question whether this context driven approach is supported by the statistical performance. The statistical comparative analysis will be executed adjacent to previous studies, where the RPL was contrasted to the LCM. The model selection criteria include kernel density plots and ordinary least square regressions of individual WTP values and choice probabilities of the RPL against the LCM, simple comparisons of measures of fit and finally statistical testing with methods for non-nested models. In case both models perform rather similarly, I will argue that model selection should depend on the context and the aims of the researcher. My impression up to now is that the statistical performance of models is frequently overstated in the literature while the theoretical implications of the different models and its consequences on behavioral assumptions play a minor role when arguing in 3
5 favor or against a model. The paper is structured as follows. Section 2 summarizes the LCM and RPL. Section 3 details the survey and section 4 presents the estimation results. Section 5 is concerned with the statistical comparison of the two models while section 6 discusses issues that go beyond the statistics. Section 7 concludes. 2 Choice Models This section describes the two models at stake, the RPL and the LCM. The models are similar in a way that they both incorporate heterogeneity in respondents preferences on attributes. While the RPL assumes a continuous distribution of the parameters to introduce heterogeneity, the LC uses discrete classes to reach the same. In a sense, the LC is a special case of the RPL with parameters being distributed discretely and hence can be referred to as a semi-parametric sister of the RPL. The analysis throughout the paper will be based on some conventions and definitions which will be explained in the following. We assume a randomly selected individual i which chooses repeatedly in t situations between several alternatives n. Each alternative accommodates attributes k with levels A kn which vary over alternatives. For simplicity we assume indirect utility functions U in for each alternative n and individual i to be linear with respect to attribute levels A kn and price p. For each alternative there are utility sensitive elements e int that cannot be 4
6 observed by the researcher but are known to the individual. This very simple formulation can be written as U in = V in + e il = β 1 A i1n + β 2 A i2n β k A ikn + e in (1) where A ikn is the level of attribute k for alternative n and β k the corresponding utility coefficient. In a CE, the CL probability for individual i to choose alternative m is given as P r im = exp(β 1A i1m + β 2 A i2m β k A ikm ) N n=1 exp(β 1A i1n + β 2 A i2n β k A ikn ) (2) In the following I will expand the CL specification to derive the RPL and the LCM. 2.1 The Random Parameters Logit Model The RPL is characterized by randomness in parameters. In the CL the parameters are fixed and take the same value for all respondents. The RPL specification introduces a random component in the parameters such as β ik = β k + η ki (3) where η ki is an error term with distribution f(η ki ) and mean 0 and variance φ 2. Hence β ik is a random variable with distribution f( β ik ) and mean 5
7 β k. The distribution function can be chosen by the researcher without further limitations 1. The unconditional PRL choice probability is then given as a weighted average of all possible β ik for the attribute parameters that are considered random. P r im =... P r im β i1 = β i2 = β ik = f( β i1 )f( β i2 )...f( β ik )d( β i1 )d( β i2 )...d( β ik ) (4) with P r im = exp( β i1 A i1m + β i2 A i2m β ik A ikm ) N n=1 exp( β i1 A i1n + β i2 A i2n β ik A ikn ) (5) The multidimensional Integral does not have a closed form, so that the probability can only be achieved with simulation. Most commonly, the maximum simulated likelihood method is applied. In this specification, each decision maker has his own parameters i.e. each decision maker has different preferences. 2.2 The Latent Class Model The LCM can be regarded as a special case of the RPL with β k taking a finite number S of values < β k 1, β k 2,..., β k s > with corresponding probabilities 1 The most common distribution functions are the normal, log normal and triangular. In case that f( β ik ) = 1 for β ik = b and f( β ik ) = 0 β ik b the model reduces to the CL. 6
8 < h 1, h 2,..., h s >. The unconditional probability to choose alternative m is the weighted average of the s β k s parameters. S P r im = h s P r im s (6) s=1 with P r im s being the CL probability to choose alternative m when belonging to class s. P r im s = exp(β i1 sa i1m + β i2 s A i2m β ik s A ikm ) N n=1 exp(β i1 sa i1n + β i2 s A i2n β ik s A ikn ) (7) h s are unknown but can be estimated with a multinomial logit model. h s = exp(ζ s X i ) S s=1 exp(ζ sx i ) (8) where X i is a vector of case-specific variables like income, age, or attitudes that have an effect on the class probability and ζ s is the corresponding parameter vector for class s. The vector X i can comprise only a constant if case-specific variables are not available or do not explain the class probability. In this case, the heterogeneity is unobserved. The number of classes can be chosen by the researcher but one has to keep in mind that the class probabilities are subject to a statistical procedure rather than behavioral assumptions. To identify the optimal number of classes statistically, measures of fit like CAIC or BIC are commonly used. However, one should also make sure that the parameters of the classes are valid in a behavioral sense. 7
9 3 Data The data I will use in the following analysis was generated within the research project Sustainable Hyderabad in January 2010 in Hyderabad, India. 2 The sample covers 798 private households from the Greater Hyderabad area, stratified by class (slum, middle class, high class). The study purpose was to investigate consumer behavior and consumption patterns of electricity use. The CE consisted of 27 choice sets and each choice set includes five attributes with three levels each as summarized in table 1. The selection of attributes and levels was based on expert interviews, focus group discussions, a preliminary WTP study [Hanisch et al., 2010] and pretests. From the 3 5 possible alternatives an orthogonal array with 54 alternatives was constructed. The study was blocked into three surveys, so that each respondent had to answer nine choice sets with each consisting of two alternatives. The attributes scheduled power cuts and unscheduled power cuts refer to the duration of daily power cuts, which are either pre-announced in newspapers and television or appear suddenly without warning. The latter ones are expected to be more severe, as people are not able to adjust to it beforehand. The worst case scenario means a total duration of one hour power cuts per day, which is not uncommon during summer. The share of renewable energy in the electricity mix, the third attribute, is currently two percent in Hyderabad, however the local electricity regulator specified the minimum percentage to five percent. 2 more details on the survey are found in [Rommel et al., 2010] 8
10 According to the expert interviews, the ten percent scenario is the maximum one can expect in the nearest future. The last attribute refers to the organizational form of the supply company. The current supplier of electricity in Hyderabad is the state owned Andhra Pradesh Central Power Distribution Company Limited, which has a monopoly on distribution. A possible reform of the power sector can lead to more competition and private companies entering the market. Another concept, which exists already in four districts in Andhra Pradesh, are cooperatively organized distribution companies, where the customer is member of the company. Some examples showed better service quality and more satisfied end users. These three options, Government (status quo), private distribution company and cooperative society, are included as dummy variables in the choice sets. Finally, the cost attribute was chosen in a way that it will not overburden the financial capacity of the respondents. As 50 percent of the respondents are slum dwellers, a bigger increase than 20 percent to the current electricity bill could be infeasible and hence could decrease the accuracy of the estimation. 4 Results In the following estimations, case-specific variables were not included in order to facilitate the understanding and interpretation of the model. The estimation was carried out with NLOGIT 4.0 and STATA All models have the same utility specification and variables, while they differ by definition in 9
11 Table 1: Attribute Description Variable Description Levels Code 30 minutes 30 Scheduled power cuts SCHED 15 minutes 15 per day in minutes 0 minutes 0 UNSCH REN PRIV COOP COST Unscheduled power cuts per day in minutes Percentage of renewable energy in the electricity mix Whether supply is carried out by private company Whether supply is carried out by cooperative society Additional cost to monthly electricity bill in per cent 30 minutes minutes 15 0 minutes 0 2% 2 5% 5 10% 10 Private company 1 COOP or Government 0 Cooperative 1 PRIV or Government 0 no additional costs 0 10% 1 20% 2 10
12 the distribution of parameters. 51 respondents were dropped as these were considered to be irrational, not picking a dominant choice set 3. The exclusion improved the estimation significantly. In a first step I estimated the simple CL specification, from which the results were used as starting values for the other models. The results are given in table 2. Table 2: Estimation results : Conditional Logit Variable Coefficient (Std. Err.) SCHED (0.002) UNSCH (0.002) REN (0.006) PRIV (0.053) COOP (0.048) COST (0.249) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% All parameters are significant on a one percent significance level and the two dummy variables are jointly significant on a one percent significance level (Wald test, likelihood ratio test). All signs are as expected and the overall model is highly significant. An increase in scheduled and unscheduled power cuts and in costs reduce the probability of the alternative to be chosen. An increase in renewable energy increases the probability. There is 3 One choice set, where one alternative was dominating the other one was included. Respondents who picked the dominated choice set are likely to not having understood the CE and hence were dropped. 11
13 little difference of preferences between a governmental supplier and a private company but the probability to choose an alternative decreases when the organizational form is switched to a cooperative. 4.1 Random Parameters Logit Estimation The RPL model was estimated with all attributes being randomly and normally distributed. Although a normal distribution allows for positive and negative values, which may be misleading for costs and power cuts, there are several reasons to choose it. First, the normal distribution has been widely used and comprises some convenience features. Second, in case there are high parameter values, the probability that a value is on the wrong side is very low. Hence, the normal distribution can still be a good approximation [Meijer and Rouwendal, 2006, Sillano and de Dios Ortand, 2005]. Third, as the data were collected in a developing country and illiterate respondents made up some percentage of the observation, it is likely that due to limited understanding, the choices were made in an irrational way. Yet it is not possible to identify these wrong choices, it is likely that some respondents actually have positive parameters for cost and power cuts. At least the data says so. Hence, a wrong sign is a problem of data collection rather than of the statistical and behavioral assumptions. Fourth, as this paper aims to compare the RPL with the LCM, theoretical assumptions on the sign of the parameter do not play a major role. The LCM is also not restricted to one sided parameters, so why should the RPL 12
14 be? Fifth, after estimating several models with different parameter distributions, the model with all parameters being normally distributed gives the best fit. Sixth, using different distributions that force the parameter to have a positive sign only lead to further difficulties with interpretation and estimation. E.g. the log-normal distribution has a long thick tail, and the corresponding log-likelihood function tends to be extremely flat at its maximum [Sillano and de Dios Ortand, 2005]. Another unusual construction applied here is to allow the cost parameter to be random. In most studies, the cost parameter is nonrandom (e.g. [Morrison and Nalder, 2009, Carlsson and Martinsson, 2008, Revelt and Train, 1998] for applications in the electricity sector) due to several reasons (cf section 6). However, as a randomly distributed cost parameter increases the model fit significantly (likelihood ratio test), it seems inappropriate to follow this convenience assumption. Further there is no theoretical argument, why the cost parameter should be nonrandom. [Meijer and Rouwendal, 2006] (p.242) argue Treating the coefficient of the monetary variable as a fixed constant [...] gives markedly different distributions of the [WTP] and cannot be recommended. Lastly, we allow for correlation along the random parameters. This is also justified by a high improvement of the log likelihood at convergence. Further, this assumption can be very reasonable. For example might someone who has problems with scheduled power cuts also have problems with unscheduled power cuts. We used 1000 Halton draws for the simulation of the random parameters and 13
15 maximized the simulated log likelihood function using the BHHH Estimator. Our estimation implies similar results as the CL 4 but significant standard deviations of the parameters suggest heterogeneity in preferences. A likelihood ratio test rejected the null hypothesis that the CL and the RPL are the same. Table 3 gives the results of the RPL. Table 3: Estimation results : RPL Variable Coefficient (Std. Err.) SCHED (0.004) UNSCH (0.003) REN (0.012) PRIV (0.109) COOP (0.100) COST (0.807) Standard deviations of the random parameters SCHED SD (0.005) UNSCH SD (0.005) REN SD (0.020) PRIV SD (0.182) COOP SD (0.331) COST SD (1.347) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% 4 It is not by chance that the parameters in the RPL are larger than in the CL. [Sillano and de Dios Ortand, 2005] explain the confounding of the scale parameter, which is smaller in the RPL as parts of the random variation are incorporated in the systematic part of the utility function V in. 14
16 4.2 Latent Class Estimation The LC model has a reasonable fit with three classes. From these classes, two turned out to be dominating and the other one can be considered as a small outlier group with a class probability of less than five percent. The results of the LC model are given in table 4. The overall model is highly significant. Only REN is not significant in Class 3 which might reflect the ignorance towards renewable energy in the population. All parameters have the expected sign and are significant at least at a 10 percent significance level. The results imply the virtue of the model. While in Class 1 and 2 the parameters for PRIV and COOP are negative, they are positive in Class 3 suggesting antipodal preferences. Some respondents are highly in favor of the status quo (Governmental distribution company) and others prefer a reformation of the power sector with private of cooperative distributors. These antipodal differences can lead to insignificant parameters in simpler models like the CL. Averaging would not make sense in this case. Also a RPL struggles with these kind of preferences. A normally distributed parameter for example has the highest probability at its mean but with antipodal preferences the mean probability would be very low. 5 Model Comparison Comparing two statistical models is usually a task to be performed in the very initial stage of statistical analysis. The researcher aims to find the best 15
17 Table 4: Estimation results : LCM Variable Class 1 Class 2 Class SCHED (0.075) (0.004) (0.002) UNSCH REN PRIV COOP COST PROB (0.068) (0.004) (0.002) (0.322) (0.146) (0.006) (1.540) (0.141) (0.048) (2.217) (0.136) (0.043) (12.654) (1.265) (0.202) (0.026) (0.022) (0.021) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% Standard errors in parentheses 16
18 statistical fit for his data, hence tries out different models and specifications, chooses one based on certain criteria and only then begins interpreting the estimation results. With discrete choice analysis, several models have been developed which each compete with each other. The more sophisticated models like the above described ones introduce behavioral assumptions that go beyond the distribution of the error term. These assumptions are meaningful to describe human behavior but mostly, as e.g. [Hensher and Greene, 2003] mention, there is no theoretical foundation for choosing any of the available distributions. As discussed in Subsection 4.1, there are some behavioral criteria to choose a certain distribution, i.e. a log-normal instead of a normal at a monetary attribute implies that subjects are rather willing to buy the same product at a lower price. However, why choosing a normal instead of any other similar distribution remains unclear but might have an impact on model performance. The same argument is true when comparing the LCM with the RPL. In the LCM we assume discrete distributions of the parameters, but do we have any reason based on economic theory for it? Basically no. Still there might be different reasons to choose a model with heterogeneity apart from statistical fit and basic assumptions from economic theory. Several authors addressed this concern before. Proposing different ways to compare the RPL with the LCM, results are ambiguous. [Greene and Hensher, 2003] analyzed WTP values and choice probabilities in detail and find small support for the LCM. The same conclusion is drawn by [Birol et al., 2006], who argue that apart from a better performance, the LCM is superior for welfare measures 17
19 and interpretation. [Colombo et al., 2009] contrasts three models, the RPL, LCM and the Covariance-Heterogeneity model. Their contribution focuses on the sources of heterogeneity. Relying on statistical tests and welfare analysis, they find a small dominance of the LCM. [Provencher and Bishop, 2004] find no model dominating but surprisingly high correct predictions of the CL. [Hynes et al., 2008] report similar results of the LCM and RPL in terms of welfare estimates but finally promote the LCM model as the more informative one. [Torres et al., 2011] compare the models using Monte Carlo simulations with incorporated heterogeneity. They simulate preference heterogeneity based on a RPL and apply the data to a LCM and vice versa. Their findings imply that in case the RPL is the true model, the errors by using a LCM are rather small. In the opposite case the errors are becoming larger. Overall, they find the performance of the RPL best. Summarized, most studies find a small dominance of the LCM, but no author argues strongly in favor of the LCM compared the the RPL. This chapter will compare the models from a statistical perspective. First, measures of fit of both models will be contrasted. Then graphical comparisons of individual WTP values and choice probabilities are presented. In the next chapter, based on the results of the statistical analysis, I will discuss the rationale to decide for a model beyond the scope of statistical analysis. 18
20 5.1 Measures of Fit Table 5 contrasts measures of fit of the three models. Likelihood ratio tests show that the CL is outperformed by the other two models. Testing the LCM against the RPL requires a test for non-nested models. I used a test proposed by [Ben-Akiva and Swait, 1986]. The result suggests that the LCM is significantly better than the RPL in terms of the log likelihood given a p- value that the likelihoods of both models are the same of The AIC, BIC, and R 2 values indicate a small dominance of the LCM. All values are slightly better in the LCM. Also the correct predictions are highest in the LCM. Table 5: Measures of Fit Measure CL RPL LCM Log Likelihood McFadden R Adj. McFadden R AIC BIC Chi Squared Correct Predictions Parameters Conditional WTP Values In this section I will compare the conditional WTP values of both models. These are calculated as described in [Greene, 2007] (p.n17-36). The conditional mean incorporates all information of one individual gathered in the 19
21 CE including his choices. Still the inclusion of case-specific variables are neglected for simplicity. To get an overview of the results table 6 gives the mean and the standard deviations of the conditional WTP values in percent for the LCM and RPL and the unconditional WTP values of the CL. The latter ones are simply calculated as the ratio of the attribute at stake and the cost attribute β i β cost. The WTP is the marginal rate of substitution between an attribute and the cost attribute i.e. the WTP value gives necessary compensation in monetary terms for a one unit deterioration of an attribute to remain the same level of utility. In this case the WTP is in percent additional to the electricity bill. For example, an increase in scheduled power cuts (SCHED) by one minute has be be compensated by a decrease of the monthly electricity bill by percent according to the CL and according the the LCM. The conditional WTP values are calculated in the same manner but based on the individual parameter estimates. In the RPL, these estimates can lead to huge WTP values in case the individual cost parameter is small, while the attribute parameter is large. This is the reason why the standard deviations of the RPL WTP values is significantly higher than in the LCM. Comparing the values of the CL with the RPL or LCM is not meaningful as the values given for the RPL and LCM are conditional values based on individual estimates. These are not necessarily an unbiased estimator of the WTP as explained by [Greene, 2007]. Still it is useful to compare the values of the LCM and the RPL. 20
22 Table 6: Conditional WTP Values Variable CL RPL LCM Mean SD Mean SD SCHED UNSCH REN PRIV COOP As expected the RPL shows a higher standard deviation than the LCM. Further, The mean values of the RPL are smaller than in the LCM except for PRIV and COOP. The latter ones are surprisingly high, which can be due to the antipodal preferences as explained in subsection 4.2. To get more insights on the distributions of the WTP values, kernel density estimates of the individual WTP values of the RPL and LCM are plotted in figure 1. It seems that the distributions peak quite similar, but the RPL values are more spread out. This reflects the high standard deviation. A closer look at the LCM distributions reveals different peaks. The RPL values however follow a rather normally shaped distribution. Both these shapes are implied by the model and highlight the differences between them. 5.3 Choice Probabilities Another way of comparison is contrasting the choice probabilities of the two models, i.e. the estimates on y for each individual based on the utility parameters of each model. As we have generic alternatives, choice probabilities 21
23 Density Scheduled Power Cuts Density Unscheduled Power Cuts WTPSCHEDRPL kernel = epanechnikov, bandwidth = WTPSCHEDRPL WTPSCHEDLCM (a) Scheduled Power Cuts WTP kernel = epanechnikov, bandwidth = WTPUNSCHRPL WTPUNSCHLCM (b) Unscheduled Power Cuts Density Renewable Energy Density Private Distributor WTP kernel = epanechnikov, bandwidth = WTPRENRPL WTPRENLCM (c) Renewable Energy WTP kernel = epanechnikov, bandwidth = WTPPRIVRPL WTPPRIVLCM (d) Private Distributor Cooperative Distributor Density WTP kernel = epanechnikov, bandwidth = WTPCOOPRPL WTPCOOPLCM (e) Cooperative Distributor Figure 1: Kernel Density Estimates for WTP 22
24 do not have much meaning per se. 5 Table 7 gives the results of a linear regression of the choice probabilities of the RPL on the LCM and figure 2 and plots the choice probabilities of the two models. Table 7: OLS Regression Choice Probabilities Variable Coefficient (Std. Err.) LCMCP (0.004) Intercept (0.002) N 6723 R F (1,6721) Significance levels : : 10% : 5% : 1% The regression suggests that the choice probabilities of the LCM and of the RPL are highly correlated. In fact, the variation of the choice probabilities of the LCM explain the variation of choice probabilities of the PRL by 80.2 per cent. [Greene and Hensher, 2003] perform the same analysis and found lower R 2 values for their data. While they conclude that each model is representing the choice responses quite differently for the majority of the sample (p.695), the results presented here indicate rather similar choice probabilities. This is a sign for the similarity of the models. This observation contrasts to our previous analysis. To analyzing the choice probabilities a bit further, figure 3 displays kernel density functions of the choice probabilities. For each choice probability, the graph shows the corresponding density. For example, in the RPL most 5 Only the choice probabilities for alternative 2 are presented. The choice probabilities of alternative 1 are simply P r 1nt = 1 P r 2nt. 23
25 Comparison of Choice Probabilities RPLCP LCMCP Figure 2: Comparison of Choice Probabilities choice probabilities are around 0.8 while in the LCM, it is at about The same pattern is true for the choice probabilities below 0.5. This observation indicates that choice probabilities of the LCM are more at the extremes, i.e. closer to zero or closer to one than in the RPL. This result is consistent with the findings from [Greene and Hensher, 2003]. 6 Discussion With the here applied data the LCM seems to have slight statistical advantages over the RPL, however there might be other reasons for choosing the LCM. In a way the choice of the model should depend on the purpose of research and behavioral assumptions. If, for example, it is assumed that each 24
26 Kernel Density Choice Probability Density kernel = epanechnikov, bandwidth = LCMCP RPLCP Figure 3: Kernel Choice Probabilities individuals have each different preferences, which, as an example, could be the case with cultural goods, a RPL might be the appropriate model. In case the researcher assumes antipodal preferences of different groups that are homogeneous within, the structure of the LCM satisfies the context. For instance, the purpose of a study could be eliciting preferences for a policy scheme for wage rates. One could expect that members of a labor union will have rather similar preferences as a group and employers might have opposite preferences which are also homogeneous within the group. However, in many cases, these assumptions are not very robust and the researcher cannot draw any observations on heterogeneity a priori. This survey exemplifies the point. The study was performed in a developing country and lower class people show very similar living standards, habits, behavior, and struggles. 25
27 Though maybe differently affected by power cuts, the purpose of electricity is mainly similar - for cooling and lighting. Thus, it makes sense to find general conclusions for this group. This observation, I call within-class homogeneity. In the same country, middle class and high class people are expected to behave very differently. Electricity is used for a large number of different appliances. It is used to watch TV, listen to music, cleaning, and a variety of hobbies. Hence, these people are expected to show different preferences for better quality within their class. I call this preference distribution within-class heterogeneity. In case the researcher expects homogeneity over the whole sample (e.g. he samples only slum inhabitants), he can stick with the CL. When different groups within the sample are expected, which show within class homogeneity, the LCM should be the model of choice. Expecting overall heterogeneity leads to the RPL and and if we have within-class heterogeneity the decision is not clear. Here, the striking point is the assumption whether the differences in preferences are antipodal between the classes. If no, one can run the RPL but if yes, something like a RPL-LCM combination is needed. The latter one means that one has to account for the fact that some random parameter values are not present, at least there is no overall continuity. For example, the LCM shows bipolar preferences for the attribute representing the distribution company. While there are certain types of people preferring privatization, others are strongly against it. Only a few people are indifferent. These gaps in the distribution are well represented in figure 1 d and e. 26
28 Apart from the assumptions on heterogeneity, there are other advantages and disadvantages of both models. The RPL is more complex and the estimation process demands computational time and deep model understanding. Further, the flexibility of the RPL is not only a virtue but can also turn out to be a struggle for the researcher. Deciding on which parameters to be random and which distribution might be appropriate is a demanding task and hardly any model specification will show a clear dominance. Choosing the number of classes in the LCM is also challenging in a way. Mostly, the researcher has no prior information on the appropriate number of classes and relies on measures of fit in the selection. Still, the effort to identify the optimal model specification is by far less than in the RPL. Model interpretation might be easier with the LCM. [Hensher and Greene, 2003] warn that in the RPL, mean parameters are not to be interpreted as in the CL especially in more confounded specifications with correlation among attributes and alternatives. Calculating marginal effects and WTP values often requires thorough investigation of all correlations and a laborious dismantling of the Cholesky matrix. Making the RPL more operational, researchers often base their behavioral assumptions on technical feasibility. A frequently observed example is a constant cost parameter. [Revelt and Train, 1998] (p.650) provide technical reasons: We specify the price coefficient to be fixed while allowing the other coefficients to vary. The willingness to pay for each attribute is thereby distributed in the same way as the attribute s coefficient, 27
29 which is convenient for interpretation of the model. Apparently, several authors follow their recommendation by citing their paper or not even giving any reason. Yet the question remains on what behavioral or theoretical basis this assumptions is formulated. [Meijer and Rouwendal, 2000] (p.12) put is as: [Keeping the coefficient of the monetary variable nonrandom] is, however, not very satisfactory. A priori it appears at least as likely that the coefficients in the utility function that refer to the monetary variable are random variables, as that those referring to any other variable are. All these considerations are not necessary with the LCM, which is clearly an advantage. Apart from these technical challenges, the LCM is more straightforward in interpretation. Arguing with classes instead of distributions of the population gives more scope for policy recommendations. Often CEs are conducted to inform policy makers. A clustering approach as in the LCM is easier to understand and leads to more straight forward results. It also allows the researcher to give names to the classes and segment the population into multidimensional interest groups (see e.g. [Meyerhoff et al., 2010]). Conclusively, I suggest - neglecting any assumptions on heterogeneity - that the LCM is a model that should be used for demonstrative purposes. In case a broad audience with limited statistical and economic background or policy 28
30 makers are the target group, the LCM is clearly more accessible and coherent. If a deeper analysis is required and the research purpose aims also towards methodological issues, the RPL is a fruitful challenge. Statistic lovers have more leeway and are free to experiment with new specifications. In fact, the RPL s flexibility is astonishing and further research necessary. The most recent advances tend to semi parametric distributions of the random parameters. [Fosgerau and Hess, 2008] and [Rouwendal et al., 2010] propose methods where no a priori assumptions on the distribution of random parameters is necessary. 7 Conclusion In this paper, I aimed to compare two models for discrete choice that incorporate heterogeneity in preferences, the RPL and the LCM. I discussed both models and pointed out their characteristics. The data used for the analysis came from a CE survey in Hyderabad, India and investigated consumer preferences for better electricity quality. The data is appropriate as it covers highly heterogeneous respondents ranging from very poor and traditional to rich and modern. A statistical comparison of the models was carried out using measures of fit, a test for non-linear and non-nested models, conditional WTP values and choice probabilities. The LCM shows some advantages in terms of statistical fitness and a smaller variance in conditional WTP values, yet it is not possible to derive a clear statement on which model performs 29
31 better. I finally discussed the two models taking into account further criteria than statistical success and provided some suggestions on when to use which model. The choice of the model depends, first, on the a priori assumptions on the kind of heterogeneity and, second, on the purpose of the study. However, it remains to the finger tip feel of the researcher to decide on a model. 30
32 References [Ben-Akiva and Swait, 1986] Ben-Akiva, M. and Swait, J. (1986). The akaike likelihood ratio index. Transportation Science, 20(2): [Birol et al., 2006] Birol, E., Karousakis, K., and Koundouri, P. (2006). Using a choice experiment to account for preference heterogeneity in wetland attributes: The case of cheimaditida wetland in greece. Ecological Economics, 60(1): [Carlsson and Martinsson, 2008] Carlsson, F. and Martinsson, P. (2008). Does it matter when a power outage occurs? a choice experiment study on the willingness to pay to avoid power outages. Energy Economics, 30(3): [Colombo et al., 2009] Colombo, S., Hanley, N., and Louviere, J. J. (2009). Modeling preference heterogeneity in stated choice data: an analysis for public goods generated by agriculture. Agricultural Economics, 40(3): [Fosgerau and Hess, 2008] Fosgerau, M. and Hess, S. (2008). Competing methods for representing random taste heterogeneity in discrete choice models. Mpra paper, University Library of Munich, Germany. [Greene, 2007] Greene, W. H. (2007). NLogit Version 4.0: Reference Guide. Econometric Software INC., Plainview, 1 edition. 31
33 [Greene and Hensher, 2003] Greene, W. H. and Hensher, D. A. (2003). A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37(8): [Hanisch et al., 2010] Hanisch, M., Kimmich, C., Rommel, J., and Sagebiel, J. (2010). Coping with power scarcity in an emerging megacity: A consumers perspective from hyderabad. International Journal of Global Energy Issues, 33(3&4): [Hensher and Greene, 2003] Hensher, D. and Greene, W. (2003). The mixed logit model: The state of practice. Transportation, 30(2): [Hynes et al., 2008] Hynes, S., Hanley, N., and Scarpa, R. (2008). Effects on welfare measures of alternative means of accounting for preference heterogeneity in recreational demand models. American Journal of Agricultural Economics, 90(4): [McFadden, 1974] McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Zarembka, P., editor, Frontiers in econometrics, pages Academic Press. [Meijer and Rouwendal, 2000] Meijer, E. and Rouwendal, J. (2000). Measuring welfare effects in models with random coefficients. [Meijer and Rouwendal, 2006] Meijer, E. and Rouwendal, J. (2006). Measuring welfare effects in models with random coefficients. Journal of Applied Economics, 21(2):
34 [Meyerhoff et al., 2010] Meyerhoff, J., Ohl, C., and Hartje, V. (2010). Landscape externalities from onshore wind power. Energy Policy, 38(1): [Morrison and Nalder, 2009] Morrison, M. and Nalder, C. (2009). Willingness to pay for improved quality of electricity supply across business type and location. The Energy Journal, 30(2): [Provencher and Bishop, 2004] Provencher, B. and Bishop, R. C. (2004). Does accounting for preference heterogeneity improve the forecasting of a random utility model? a case study. Journal of Environmental Economics and Management, 48(1): [Revelt and Train, 1998] Revelt, D. and Train, K. (1998). Mixed logit with repeated choices: Households choices of appliance efficiency level. Review of Economics and Statistics, 80(4): [Rommel et al., 2010] Rommel, K., Hanisch, M., Deb, K., and Sagebiel, J. (2010). Consumer preferences for improvements of power supply quality: Results from a choice experiment in Hyderabad, India and implications for energy policy: Paper presented at the European conference of the IAEE, August 25-28, 2010 in Vilnius, Lithuania. [Rouwendal et al., 2010] Rouwendal, J., de Blaeij, A., Rietveld, P., and Verhoef, E. (2010). The information content of a stated choice experiment: A new method and its application to the value of a statistical life. Transportation Research Part B: Methodological, 44(1):
35 [Sillano and de Dios Ortand, 2005] Sillano, M. and de Dios Ortand, J. (2005). Willingness-to-pay estimation with mixed logit models: some new evidence. Environment and Planning A, 37(3): [Torres et al., 2011] Torres, C., Colombo, S., and Hanley, N. (2011). Incorrectly accounting for taste heterogeneity in choice experiments: Does it really matter for welfare measurement? Stirling Economics Discussion Papers , University of Stirling, Division of Economics. 34
Uncovering Complexity-induced Status Quo Effects in Choice Experiments for Environmental Valuation
Uncovering Complexity-induced Status Quo Effects in Choice Experiments for Environmental Valuation Malte Oehlmann*, Jürgen Meyerhoff, Priska Weller - Preliminary version - Abstract This study investigates
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationKeep It Simple: Easy Ways To Estimate Choice Models For Single Consumers
Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationeconstor zbw www.econstor.eu
econstor www.econstor.eu Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Hess, Stephane
More informationFreight transport distance and weight as utility conditioning effects on a stated choice experiment
Journal of Choice Modelling, 5(1), 2012, pp 64-76 www.jocm.org.uk Freight transport distance and weight as utility conditioning effects on a stated choice experiment Lorenzo Masiero 1,* David A. Hensher
More informationAdvantages of latent class over continuous mixture of Logit models
Advantages of latent class over continuous mixture of Logit models Stephane Hess Moshe Ben-Akiva Dinesh Gopinath Joan Walker May 16, 2011 Abstract This paper adds to a growing body of evidence highlighting
More informationANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
More informationMortgage Loan Approvals and Government Intervention Policy
Mortgage Loan Approvals and Government Intervention Policy Dr. William Chow 18 March, 214 Executive Summary This paper introduces an empirical framework to explore the impact of the government s various
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationLatent Class Regression Part II
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationTechnical Efficiency Accounting for Environmental Influence in the Japanese Gas Market
Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan asai@otsuma.ac.jp Abstract:
More informationThe primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
More informationECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE
ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationComparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors
Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston
More informationREBUTTAL TESTIMONY OF BRYAN IRRGANG ON SALES AND REVENUE FORECASTING
BEFORE THE LONG ISLAND POWER AUTHORITY ------------------------------------------------------------ IN THE MATTER of a Three-Year Rate Plan Case -00 ------------------------------------------------------------
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationInternational Journal of Arts and Science Research Journal home page: www.ijasrjournal.com
Research Article ISSN: 2393 9532 International Journal of Arts and Science Research Journal home page: www.ijasrjournal.com JuneJuneJuneLEADER AND SUBORDINATE PERCEPTION ON LEADERSHIP PURCHASING PATTERNS
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationWhy High-Order Polynomials Should Not be Used in Regression Discontinuity Designs
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationRobust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics
Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics John Pepper Assistant Professor Department of Economics University of Virginia 114 Rouss
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationUsing Options Trading Data to Algorithmically Detect Insider Trading
MS&E 444 Final Project Report Using Options Trading Data to Algorithmically Detect Insider Trading Instructor: Prof. Kay Giesecke TA: Benjamin Armbruster Group Members: 1 Youdan Li Elaine Ou Florin Ratiu
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationCOMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationWeek 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
More informationAn Empirical Analysis of Paper Selection by Digital Printers
An Empirical Analysis of Paper Selection by Digital Printers A Thesis Presented to The Academic Faculty By Benjamin Philipp Jonen In Partial Fulfillment Of the Requirements for the Degree Master of Science
More informationSupplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
More informationHow to Win the Stock Market Game
How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationhttp://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions
A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.
More informationPredicting the Performance of a First Year Graduate Student
Predicting the Performance of a First Year Graduate Student Luís Francisco Aguiar Universidade do Minho - NIPE Abstract In this paper, I analyse, statistically, if GRE scores are a good predictor of the
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationFORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits
Technical Paper Series Congressional Budget Office Washington, DC FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits Albert D. Metz Microeconomic and Financial Studies
More informationStatistical Innovations Online course: Latent Class Discrete Choice Modeling with Scale Factors
Statistical Innovations Online course: Latent Class Discrete Choice Modeling with Scale Factors Session 3: Advanced SALC Topics Outline: A. Advanced Models for MaxDiff Data B. Modeling the Dual Response
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationHow To Find Out If A Telephone Or Internet Survey Is More Effective
Journal of Choice Modelling, 4(2), pp 1-19 www.jocm.org.uk Properties of Internet and Telephone Data Collection Methods in a Stated Choice Value of Time Study Context Maria Börjesson 1,* Staffan Algers
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationIntroduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationAn Analysis of the Effect of Income on Life Insurance. Justin Bryan Austin Proctor Kathryn Stoklosa
An Analysis of the Effect of Income on Life Insurance Justin Bryan Austin Proctor Kathryn Stoklosa 1 Abstract This paper aims to analyze the relationship between the gross national income per capita and
More informationTesting for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
More informationRisk Preferences and Demand Drivers of Extended Warranties
Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationChapter VIII Customers Perception Regarding Health Insurance
Chapter VIII Customers Perception Regarding Health Insurance This chapter deals with the analysis of customers perception regarding health insurance and involves its examination at series of stages i.e.
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationMgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side
Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationOnline Survey Data Quality and its Implication for Willingness-to-Pay: A Cross- Country Comparison
Online Survey Data Quality and its Implication for Willingness-to-Pay: A Cross- Country Comparison Zhifeng Gao, Assistant Professor ( zfgao@ufl.edu ) Lisa House, Professor Jing, Xie, PhD student Food and
More informationThe frequency of visiting a doctor: is the decision to go independent of the frequency?
Discussion Paper: 2009/04 The frequency of visiting a doctor: is the decision to go independent of the frequency? Hans van Ophem www.feb.uva.nl/ke/uva-econometrics Amsterdam School of Economics Department
More informationForecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model
Tropical Agricultural Research Vol. 24 (): 2-3 (22) Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model V. Sivapathasundaram * and C. Bogahawatte Postgraduate Institute
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationSocial Security Eligibility and the Labor Supply of Elderly Immigrants. George J. Borjas Harvard University and National Bureau of Economic Research
Social Security Eligibility and the Labor Supply of Elderly Immigrants George J. Borjas Harvard University and National Bureau of Economic Research Updated for the 9th Annual Joint Conference of the Retirement
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationA Primer on Forecasting Business Performance
A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.
More informationCREDIT SCORING MODEL APPLICATIONS:
Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationProblem of the Month Through the Grapevine
The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems
More informationPersonal Savings in the United States
Western Michigan University ScholarWorks at WMU Honors Theses Lee Honors College 4-27-2012 Personal Savings in the United States Samanatha A. Marsh Western Michigan University Follow this and additional
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationInfluence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions
Influence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions Bruce A. Babcock and Chad E. Hart Working Paper 05-WP 393 April 2005 Center for Agricultural and Rural Development Iowa State
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More information