Comparing the Latent Class Model with the Random Parameters. Logit - A Choice Experiment analysis of highly heterogeneous

Size: px
Start display at page:

Download "Comparing the Latent Class Model with the Random Parameters. Logit - A Choice Experiment analysis of highly heterogeneous"

Transcription

1 Comparing the Latent Class Model with the Random Parameters Logit - A Choice Experiment analysis of highly heterogeneous electricity consumers in Hyderabad, India Julian Sagebiel Department for Agricultural Economics, Humboldt-Universität zu Berlin julian.sagebiel@hu-berlin.de Abstract The increased application of the Stated Choice methods led researchers to develop several econometric models that relax the strict assumptions of the frequently applied Conditional Logit model. Especially the question of how to incorporate preference heterogeneity into the analysis is subject to current research. This paper contributes to the discussion by comparing two of the most commonly used models, the Latent Class Logit model and the Random Parameters Logit model. Both models have in common that they introduce heterogeneity in the systematic part of utility but differ in their assumptions of the distribution of preferences. For comparison, data from a choice experiment on electricity quality in India will be analyzed. Thereby, measures of fit, willingness to pay values and choice probabilities of both models will be contrasted. Apart from the statistical comparison, I discuss further issues that contribute to an adequate choice of the model.

2 Contents 1 Introduction 2 2 Choice Models The Random Parameters Logit Model The Latent Class Model Data 8 4 Results Random Parameters Logit Estimation Latent Class Estimation Model Comparison Measures of Fit Conditional WTP Values Choice Probabilities Discussion 24 7 Conclusion 29 1

3 1 Introduction The steady increase in applications of discrete choice models has led to a variety of econometric estimation techniques. The conditional logit model (CL), made popular by [McFadden, 1974], provides an easy-to-handle estimation process but is limited to several assumptions. While the CL can model heterogeneity by incorporating interaction terms of case-specific variables with alternative-specific constants or alternative-specific variables (attributes), other models do the same in a more sophisticated way. Worth mentioning are the random parameters logit model (RPL) and the latent class model (LCM). The RPL is characterized by accommodating heterogeneity as a continuous function of the parameters i.e. the parameters are random underlying some ex-ante specified distribution. In contrast, the LCM can be interpreted as a semi-parametric version of the RPL, which derives heterogeneity from different classes, each having its own parameters. As [Greene and Hensher, 2003] point out, the RPL is more flexible as it can induce nearly any behavioral assumption in terms of preference distribution, while the LCM benefits from its semi-parametric structure which does not require any assumption on the distribution of the parameters. The literature provides rather ambiguous results from a comparison of the models. It is hardly found that one model clearly outperforms the other by statistical performance. In this paper, I will briefly discuss the two models and compare results from choice experiment (CE) data generated to elicit willingness to 2

4 pay (WTP) measures for several attributes of electricity quality. The data is derived from a household survey conducted in February 2010 in Hyderabad, India. It covers 800 private households which are stratified into slum, middle class and high class. While income and electricity consumption as well as lifestyles vary significantly over the classes, household characteristics appear homogeneous within the classes as often witnessed in India. From this observation, my interest in the LCM arises. Regardless of the statistical fit, the LCM appears to describe the context better than the RPL. This is, the LCM could model three classes which are hypothesized to differ significantly in preferences but have homogeneous within-class preferences. If the above mentioned assumption is true, a three class model is most suitable. It remains the question whether this context driven approach is supported by the statistical performance. The statistical comparative analysis will be executed adjacent to previous studies, where the RPL was contrasted to the LCM. The model selection criteria include kernel density plots and ordinary least square regressions of individual WTP values and choice probabilities of the RPL against the LCM, simple comparisons of measures of fit and finally statistical testing with methods for non-nested models. In case both models perform rather similarly, I will argue that model selection should depend on the context and the aims of the researcher. My impression up to now is that the statistical performance of models is frequently overstated in the literature while the theoretical implications of the different models and its consequences on behavioral assumptions play a minor role when arguing in 3

5 favor or against a model. The paper is structured as follows. Section 2 summarizes the LCM and RPL. Section 3 details the survey and section 4 presents the estimation results. Section 5 is concerned with the statistical comparison of the two models while section 6 discusses issues that go beyond the statistics. Section 7 concludes. 2 Choice Models This section describes the two models at stake, the RPL and the LCM. The models are similar in a way that they both incorporate heterogeneity in respondents preferences on attributes. While the RPL assumes a continuous distribution of the parameters to introduce heterogeneity, the LC uses discrete classes to reach the same. In a sense, the LC is a special case of the RPL with parameters being distributed discretely and hence can be referred to as a semi-parametric sister of the RPL. The analysis throughout the paper will be based on some conventions and definitions which will be explained in the following. We assume a randomly selected individual i which chooses repeatedly in t situations between several alternatives n. Each alternative accommodates attributes k with levels A kn which vary over alternatives. For simplicity we assume indirect utility functions U in for each alternative n and individual i to be linear with respect to attribute levels A kn and price p. For each alternative there are utility sensitive elements e int that cannot be 4

6 observed by the researcher but are known to the individual. This very simple formulation can be written as U in = V in + e il = β 1 A i1n + β 2 A i2n β k A ikn + e in (1) where A ikn is the level of attribute k for alternative n and β k the corresponding utility coefficient. In a CE, the CL probability for individual i to choose alternative m is given as P r im = exp(β 1A i1m + β 2 A i2m β k A ikm ) N n=1 exp(β 1A i1n + β 2 A i2n β k A ikn ) (2) In the following I will expand the CL specification to derive the RPL and the LCM. 2.1 The Random Parameters Logit Model The RPL is characterized by randomness in parameters. In the CL the parameters are fixed and take the same value for all respondents. The RPL specification introduces a random component in the parameters such as β ik = β k + η ki (3) where η ki is an error term with distribution f(η ki ) and mean 0 and variance φ 2. Hence β ik is a random variable with distribution f( β ik ) and mean 5

7 β k. The distribution function can be chosen by the researcher without further limitations 1. The unconditional PRL choice probability is then given as a weighted average of all possible β ik for the attribute parameters that are considered random. P r im =... P r im β i1 = β i2 = β ik = f( β i1 )f( β i2 )...f( β ik )d( β i1 )d( β i2 )...d( β ik ) (4) with P r im = exp( β i1 A i1m + β i2 A i2m β ik A ikm ) N n=1 exp( β i1 A i1n + β i2 A i2n β ik A ikn ) (5) The multidimensional Integral does not have a closed form, so that the probability can only be achieved with simulation. Most commonly, the maximum simulated likelihood method is applied. In this specification, each decision maker has his own parameters i.e. each decision maker has different preferences. 2.2 The Latent Class Model The LCM can be regarded as a special case of the RPL with β k taking a finite number S of values < β k 1, β k 2,..., β k s > with corresponding probabilities 1 The most common distribution functions are the normal, log normal and triangular. In case that f( β ik ) = 1 for β ik = b and f( β ik ) = 0 β ik b the model reduces to the CL. 6

8 < h 1, h 2,..., h s >. The unconditional probability to choose alternative m is the weighted average of the s β k s parameters. S P r im = h s P r im s (6) s=1 with P r im s being the CL probability to choose alternative m when belonging to class s. P r im s = exp(β i1 sa i1m + β i2 s A i2m β ik s A ikm ) N n=1 exp(β i1 sa i1n + β i2 s A i2n β ik s A ikn ) (7) h s are unknown but can be estimated with a multinomial logit model. h s = exp(ζ s X i ) S s=1 exp(ζ sx i ) (8) where X i is a vector of case-specific variables like income, age, or attitudes that have an effect on the class probability and ζ s is the corresponding parameter vector for class s. The vector X i can comprise only a constant if case-specific variables are not available or do not explain the class probability. In this case, the heterogeneity is unobserved. The number of classes can be chosen by the researcher but one has to keep in mind that the class probabilities are subject to a statistical procedure rather than behavioral assumptions. To identify the optimal number of classes statistically, measures of fit like CAIC or BIC are commonly used. However, one should also make sure that the parameters of the classes are valid in a behavioral sense. 7

9 3 Data The data I will use in the following analysis was generated within the research project Sustainable Hyderabad in January 2010 in Hyderabad, India. 2 The sample covers 798 private households from the Greater Hyderabad area, stratified by class (slum, middle class, high class). The study purpose was to investigate consumer behavior and consumption patterns of electricity use. The CE consisted of 27 choice sets and each choice set includes five attributes with three levels each as summarized in table 1. The selection of attributes and levels was based on expert interviews, focus group discussions, a preliminary WTP study [Hanisch et al., 2010] and pretests. From the 3 5 possible alternatives an orthogonal array with 54 alternatives was constructed. The study was blocked into three surveys, so that each respondent had to answer nine choice sets with each consisting of two alternatives. The attributes scheduled power cuts and unscheduled power cuts refer to the duration of daily power cuts, which are either pre-announced in newspapers and television or appear suddenly without warning. The latter ones are expected to be more severe, as people are not able to adjust to it beforehand. The worst case scenario means a total duration of one hour power cuts per day, which is not uncommon during summer. The share of renewable energy in the electricity mix, the third attribute, is currently two percent in Hyderabad, however the local electricity regulator specified the minimum percentage to five percent. 2 more details on the survey are found in [Rommel et al., 2010] 8

10 According to the expert interviews, the ten percent scenario is the maximum one can expect in the nearest future. The last attribute refers to the organizational form of the supply company. The current supplier of electricity in Hyderabad is the state owned Andhra Pradesh Central Power Distribution Company Limited, which has a monopoly on distribution. A possible reform of the power sector can lead to more competition and private companies entering the market. Another concept, which exists already in four districts in Andhra Pradesh, are cooperatively organized distribution companies, where the customer is member of the company. Some examples showed better service quality and more satisfied end users. These three options, Government (status quo), private distribution company and cooperative society, are included as dummy variables in the choice sets. Finally, the cost attribute was chosen in a way that it will not overburden the financial capacity of the respondents. As 50 percent of the respondents are slum dwellers, a bigger increase than 20 percent to the current electricity bill could be infeasible and hence could decrease the accuracy of the estimation. 4 Results In the following estimations, case-specific variables were not included in order to facilitate the understanding and interpretation of the model. The estimation was carried out with NLOGIT 4.0 and STATA All models have the same utility specification and variables, while they differ by definition in 9

11 Table 1: Attribute Description Variable Description Levels Code 30 minutes 30 Scheduled power cuts SCHED 15 minutes 15 per day in minutes 0 minutes 0 UNSCH REN PRIV COOP COST Unscheduled power cuts per day in minutes Percentage of renewable energy in the electricity mix Whether supply is carried out by private company Whether supply is carried out by cooperative society Additional cost to monthly electricity bill in per cent 30 minutes minutes 15 0 minutes 0 2% 2 5% 5 10% 10 Private company 1 COOP or Government 0 Cooperative 1 PRIV or Government 0 no additional costs 0 10% 1 20% 2 10

12 the distribution of parameters. 51 respondents were dropped as these were considered to be irrational, not picking a dominant choice set 3. The exclusion improved the estimation significantly. In a first step I estimated the simple CL specification, from which the results were used as starting values for the other models. The results are given in table 2. Table 2: Estimation results : Conditional Logit Variable Coefficient (Std. Err.) SCHED (0.002) UNSCH (0.002) REN (0.006) PRIV (0.053) COOP (0.048) COST (0.249) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% All parameters are significant on a one percent significance level and the two dummy variables are jointly significant on a one percent significance level (Wald test, likelihood ratio test). All signs are as expected and the overall model is highly significant. An increase in scheduled and unscheduled power cuts and in costs reduce the probability of the alternative to be chosen. An increase in renewable energy increases the probability. There is 3 One choice set, where one alternative was dominating the other one was included. Respondents who picked the dominated choice set are likely to not having understood the CE and hence were dropped. 11

13 little difference of preferences between a governmental supplier and a private company but the probability to choose an alternative decreases when the organizational form is switched to a cooperative. 4.1 Random Parameters Logit Estimation The RPL model was estimated with all attributes being randomly and normally distributed. Although a normal distribution allows for positive and negative values, which may be misleading for costs and power cuts, there are several reasons to choose it. First, the normal distribution has been widely used and comprises some convenience features. Second, in case there are high parameter values, the probability that a value is on the wrong side is very low. Hence, the normal distribution can still be a good approximation [Meijer and Rouwendal, 2006, Sillano and de Dios Ortand, 2005]. Third, as the data were collected in a developing country and illiterate respondents made up some percentage of the observation, it is likely that due to limited understanding, the choices were made in an irrational way. Yet it is not possible to identify these wrong choices, it is likely that some respondents actually have positive parameters for cost and power cuts. At least the data says so. Hence, a wrong sign is a problem of data collection rather than of the statistical and behavioral assumptions. Fourth, as this paper aims to compare the RPL with the LCM, theoretical assumptions on the sign of the parameter do not play a major role. The LCM is also not restricted to one sided parameters, so why should the RPL 12

14 be? Fifth, after estimating several models with different parameter distributions, the model with all parameters being normally distributed gives the best fit. Sixth, using different distributions that force the parameter to have a positive sign only lead to further difficulties with interpretation and estimation. E.g. the log-normal distribution has a long thick tail, and the corresponding log-likelihood function tends to be extremely flat at its maximum [Sillano and de Dios Ortand, 2005]. Another unusual construction applied here is to allow the cost parameter to be random. In most studies, the cost parameter is nonrandom (e.g. [Morrison and Nalder, 2009, Carlsson and Martinsson, 2008, Revelt and Train, 1998] for applications in the electricity sector) due to several reasons (cf section 6). However, as a randomly distributed cost parameter increases the model fit significantly (likelihood ratio test), it seems inappropriate to follow this convenience assumption. Further there is no theoretical argument, why the cost parameter should be nonrandom. [Meijer and Rouwendal, 2006] (p.242) argue Treating the coefficient of the monetary variable as a fixed constant [...] gives markedly different distributions of the [WTP] and cannot be recommended. Lastly, we allow for correlation along the random parameters. This is also justified by a high improvement of the log likelihood at convergence. Further, this assumption can be very reasonable. For example might someone who has problems with scheduled power cuts also have problems with unscheduled power cuts. We used 1000 Halton draws for the simulation of the random parameters and 13

15 maximized the simulated log likelihood function using the BHHH Estimator. Our estimation implies similar results as the CL 4 but significant standard deviations of the parameters suggest heterogeneity in preferences. A likelihood ratio test rejected the null hypothesis that the CL and the RPL are the same. Table 3 gives the results of the RPL. Table 3: Estimation results : RPL Variable Coefficient (Std. Err.) SCHED (0.004) UNSCH (0.003) REN (0.012) PRIV (0.109) COOP (0.100) COST (0.807) Standard deviations of the random parameters SCHED SD (0.005) UNSCH SD (0.005) REN SD (0.020) PRIV SD (0.182) COOP SD (0.331) COST SD (1.347) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% 4 It is not by chance that the parameters in the RPL are larger than in the CL. [Sillano and de Dios Ortand, 2005] explain the confounding of the scale parameter, which is smaller in the RPL as parts of the random variation are incorporated in the systematic part of the utility function V in. 14

16 4.2 Latent Class Estimation The LC model has a reasonable fit with three classes. From these classes, two turned out to be dominating and the other one can be considered as a small outlier group with a class probability of less than five percent. The results of the LC model are given in table 4. The overall model is highly significant. Only REN is not significant in Class 3 which might reflect the ignorance towards renewable energy in the population. All parameters have the expected sign and are significant at least at a 10 percent significance level. The results imply the virtue of the model. While in Class 1 and 2 the parameters for PRIV and COOP are negative, they are positive in Class 3 suggesting antipodal preferences. Some respondents are highly in favor of the status quo (Governmental distribution company) and others prefer a reformation of the power sector with private of cooperative distributors. These antipodal differences can lead to insignificant parameters in simpler models like the CL. Averaging would not make sense in this case. Also a RPL struggles with these kind of preferences. A normally distributed parameter for example has the highest probability at its mean but with antipodal preferences the mean probability would be very low. 5 Model Comparison Comparing two statistical models is usually a task to be performed in the very initial stage of statistical analysis. The researcher aims to find the best 15

17 Table 4: Estimation results : LCM Variable Class 1 Class 2 Class SCHED (0.075) (0.004) (0.002) UNSCH REN PRIV COOP COST PROB (0.068) (0.004) (0.002) (0.322) (0.146) (0.006) (1.540) (0.141) (0.048) (2.217) (0.136) (0.043) (12.654) (1.265) (0.202) (0.026) (0.022) (0.021) N 6723 Log-likelihood Log-likelihood (NULL) Pseudo R % Significance levels : : 10% : 5% : 1% Standard errors in parentheses 16

18 statistical fit for his data, hence tries out different models and specifications, chooses one based on certain criteria and only then begins interpreting the estimation results. With discrete choice analysis, several models have been developed which each compete with each other. The more sophisticated models like the above described ones introduce behavioral assumptions that go beyond the distribution of the error term. These assumptions are meaningful to describe human behavior but mostly, as e.g. [Hensher and Greene, 2003] mention, there is no theoretical foundation for choosing any of the available distributions. As discussed in Subsection 4.1, there are some behavioral criteria to choose a certain distribution, i.e. a log-normal instead of a normal at a monetary attribute implies that subjects are rather willing to buy the same product at a lower price. However, why choosing a normal instead of any other similar distribution remains unclear but might have an impact on model performance. The same argument is true when comparing the LCM with the RPL. In the LCM we assume discrete distributions of the parameters, but do we have any reason based on economic theory for it? Basically no. Still there might be different reasons to choose a model with heterogeneity apart from statistical fit and basic assumptions from economic theory. Several authors addressed this concern before. Proposing different ways to compare the RPL with the LCM, results are ambiguous. [Greene and Hensher, 2003] analyzed WTP values and choice probabilities in detail and find small support for the LCM. The same conclusion is drawn by [Birol et al., 2006], who argue that apart from a better performance, the LCM is superior for welfare measures 17

19 and interpretation. [Colombo et al., 2009] contrasts three models, the RPL, LCM and the Covariance-Heterogeneity model. Their contribution focuses on the sources of heterogeneity. Relying on statistical tests and welfare analysis, they find a small dominance of the LCM. [Provencher and Bishop, 2004] find no model dominating but surprisingly high correct predictions of the CL. [Hynes et al., 2008] report similar results of the LCM and RPL in terms of welfare estimates but finally promote the LCM model as the more informative one. [Torres et al., 2011] compare the models using Monte Carlo simulations with incorporated heterogeneity. They simulate preference heterogeneity based on a RPL and apply the data to a LCM and vice versa. Their findings imply that in case the RPL is the true model, the errors by using a LCM are rather small. In the opposite case the errors are becoming larger. Overall, they find the performance of the RPL best. Summarized, most studies find a small dominance of the LCM, but no author argues strongly in favor of the LCM compared the the RPL. This chapter will compare the models from a statistical perspective. First, measures of fit of both models will be contrasted. Then graphical comparisons of individual WTP values and choice probabilities are presented. In the next chapter, based on the results of the statistical analysis, I will discuss the rationale to decide for a model beyond the scope of statistical analysis. 18

20 5.1 Measures of Fit Table 5 contrasts measures of fit of the three models. Likelihood ratio tests show that the CL is outperformed by the other two models. Testing the LCM against the RPL requires a test for non-nested models. I used a test proposed by [Ben-Akiva and Swait, 1986]. The result suggests that the LCM is significantly better than the RPL in terms of the log likelihood given a p- value that the likelihoods of both models are the same of The AIC, BIC, and R 2 values indicate a small dominance of the LCM. All values are slightly better in the LCM. Also the correct predictions are highest in the LCM. Table 5: Measures of Fit Measure CL RPL LCM Log Likelihood McFadden R Adj. McFadden R AIC BIC Chi Squared Correct Predictions Parameters Conditional WTP Values In this section I will compare the conditional WTP values of both models. These are calculated as described in [Greene, 2007] (p.n17-36). The conditional mean incorporates all information of one individual gathered in the 19

21 CE including his choices. Still the inclusion of case-specific variables are neglected for simplicity. To get an overview of the results table 6 gives the mean and the standard deviations of the conditional WTP values in percent for the LCM and RPL and the unconditional WTP values of the CL. The latter ones are simply calculated as the ratio of the attribute at stake and the cost attribute β i β cost. The WTP is the marginal rate of substitution between an attribute and the cost attribute i.e. the WTP value gives necessary compensation in monetary terms for a one unit deterioration of an attribute to remain the same level of utility. In this case the WTP is in percent additional to the electricity bill. For example, an increase in scheduled power cuts (SCHED) by one minute has be be compensated by a decrease of the monthly electricity bill by percent according to the CL and according the the LCM. The conditional WTP values are calculated in the same manner but based on the individual parameter estimates. In the RPL, these estimates can lead to huge WTP values in case the individual cost parameter is small, while the attribute parameter is large. This is the reason why the standard deviations of the RPL WTP values is significantly higher than in the LCM. Comparing the values of the CL with the RPL or LCM is not meaningful as the values given for the RPL and LCM are conditional values based on individual estimates. These are not necessarily an unbiased estimator of the WTP as explained by [Greene, 2007]. Still it is useful to compare the values of the LCM and the RPL. 20

22 Table 6: Conditional WTP Values Variable CL RPL LCM Mean SD Mean SD SCHED UNSCH REN PRIV COOP As expected the RPL shows a higher standard deviation than the LCM. Further, The mean values of the RPL are smaller than in the LCM except for PRIV and COOP. The latter ones are surprisingly high, which can be due to the antipodal preferences as explained in subsection 4.2. To get more insights on the distributions of the WTP values, kernel density estimates of the individual WTP values of the RPL and LCM are plotted in figure 1. It seems that the distributions peak quite similar, but the RPL values are more spread out. This reflects the high standard deviation. A closer look at the LCM distributions reveals different peaks. The RPL values however follow a rather normally shaped distribution. Both these shapes are implied by the model and highlight the differences between them. 5.3 Choice Probabilities Another way of comparison is contrasting the choice probabilities of the two models, i.e. the estimates on y for each individual based on the utility parameters of each model. As we have generic alternatives, choice probabilities 21

23 Density Scheduled Power Cuts Density Unscheduled Power Cuts WTPSCHEDRPL kernel = epanechnikov, bandwidth = WTPSCHEDRPL WTPSCHEDLCM (a) Scheduled Power Cuts WTP kernel = epanechnikov, bandwidth = WTPUNSCHRPL WTPUNSCHLCM (b) Unscheduled Power Cuts Density Renewable Energy Density Private Distributor WTP kernel = epanechnikov, bandwidth = WTPRENRPL WTPRENLCM (c) Renewable Energy WTP kernel = epanechnikov, bandwidth = WTPPRIVRPL WTPPRIVLCM (d) Private Distributor Cooperative Distributor Density WTP kernel = epanechnikov, bandwidth = WTPCOOPRPL WTPCOOPLCM (e) Cooperative Distributor Figure 1: Kernel Density Estimates for WTP 22

24 do not have much meaning per se. 5 Table 7 gives the results of a linear regression of the choice probabilities of the RPL on the LCM and figure 2 and plots the choice probabilities of the two models. Table 7: OLS Regression Choice Probabilities Variable Coefficient (Std. Err.) LCMCP (0.004) Intercept (0.002) N 6723 R F (1,6721) Significance levels : : 10% : 5% : 1% The regression suggests that the choice probabilities of the LCM and of the RPL are highly correlated. In fact, the variation of the choice probabilities of the LCM explain the variation of choice probabilities of the PRL by 80.2 per cent. [Greene and Hensher, 2003] perform the same analysis and found lower R 2 values for their data. While they conclude that each model is representing the choice responses quite differently for the majority of the sample (p.695), the results presented here indicate rather similar choice probabilities. This is a sign for the similarity of the models. This observation contrasts to our previous analysis. To analyzing the choice probabilities a bit further, figure 3 displays kernel density functions of the choice probabilities. For each choice probability, the graph shows the corresponding density. For example, in the RPL most 5 Only the choice probabilities for alternative 2 are presented. The choice probabilities of alternative 1 are simply P r 1nt = 1 P r 2nt. 23

25 Comparison of Choice Probabilities RPLCP LCMCP Figure 2: Comparison of Choice Probabilities choice probabilities are around 0.8 while in the LCM, it is at about The same pattern is true for the choice probabilities below 0.5. This observation indicates that choice probabilities of the LCM are more at the extremes, i.e. closer to zero or closer to one than in the RPL. This result is consistent with the findings from [Greene and Hensher, 2003]. 6 Discussion With the here applied data the LCM seems to have slight statistical advantages over the RPL, however there might be other reasons for choosing the LCM. In a way the choice of the model should depend on the purpose of research and behavioral assumptions. If, for example, it is assumed that each 24

26 Kernel Density Choice Probability Density kernel = epanechnikov, bandwidth = LCMCP RPLCP Figure 3: Kernel Choice Probabilities individuals have each different preferences, which, as an example, could be the case with cultural goods, a RPL might be the appropriate model. In case the researcher assumes antipodal preferences of different groups that are homogeneous within, the structure of the LCM satisfies the context. For instance, the purpose of a study could be eliciting preferences for a policy scheme for wage rates. One could expect that members of a labor union will have rather similar preferences as a group and employers might have opposite preferences which are also homogeneous within the group. However, in many cases, these assumptions are not very robust and the researcher cannot draw any observations on heterogeneity a priori. This survey exemplifies the point. The study was performed in a developing country and lower class people show very similar living standards, habits, behavior, and struggles. 25

27 Though maybe differently affected by power cuts, the purpose of electricity is mainly similar - for cooling and lighting. Thus, it makes sense to find general conclusions for this group. This observation, I call within-class homogeneity. In the same country, middle class and high class people are expected to behave very differently. Electricity is used for a large number of different appliances. It is used to watch TV, listen to music, cleaning, and a variety of hobbies. Hence, these people are expected to show different preferences for better quality within their class. I call this preference distribution within-class heterogeneity. In case the researcher expects homogeneity over the whole sample (e.g. he samples only slum inhabitants), he can stick with the CL. When different groups within the sample are expected, which show within class homogeneity, the LCM should be the model of choice. Expecting overall heterogeneity leads to the RPL and and if we have within-class heterogeneity the decision is not clear. Here, the striking point is the assumption whether the differences in preferences are antipodal between the classes. If no, one can run the RPL but if yes, something like a RPL-LCM combination is needed. The latter one means that one has to account for the fact that some random parameter values are not present, at least there is no overall continuity. For example, the LCM shows bipolar preferences for the attribute representing the distribution company. While there are certain types of people preferring privatization, others are strongly against it. Only a few people are indifferent. These gaps in the distribution are well represented in figure 1 d and e. 26

28 Apart from the assumptions on heterogeneity, there are other advantages and disadvantages of both models. The RPL is more complex and the estimation process demands computational time and deep model understanding. Further, the flexibility of the RPL is not only a virtue but can also turn out to be a struggle for the researcher. Deciding on which parameters to be random and which distribution might be appropriate is a demanding task and hardly any model specification will show a clear dominance. Choosing the number of classes in the LCM is also challenging in a way. Mostly, the researcher has no prior information on the appropriate number of classes and relies on measures of fit in the selection. Still, the effort to identify the optimal model specification is by far less than in the RPL. Model interpretation might be easier with the LCM. [Hensher and Greene, 2003] warn that in the RPL, mean parameters are not to be interpreted as in the CL especially in more confounded specifications with correlation among attributes and alternatives. Calculating marginal effects and WTP values often requires thorough investigation of all correlations and a laborious dismantling of the Cholesky matrix. Making the RPL more operational, researchers often base their behavioral assumptions on technical feasibility. A frequently observed example is a constant cost parameter. [Revelt and Train, 1998] (p.650) provide technical reasons: We specify the price coefficient to be fixed while allowing the other coefficients to vary. The willingness to pay for each attribute is thereby distributed in the same way as the attribute s coefficient, 27

29 which is convenient for interpretation of the model. Apparently, several authors follow their recommendation by citing their paper or not even giving any reason. Yet the question remains on what behavioral or theoretical basis this assumptions is formulated. [Meijer and Rouwendal, 2000] (p.12) put is as: [Keeping the coefficient of the monetary variable nonrandom] is, however, not very satisfactory. A priori it appears at least as likely that the coefficients in the utility function that refer to the monetary variable are random variables, as that those referring to any other variable are. All these considerations are not necessary with the LCM, which is clearly an advantage. Apart from these technical challenges, the LCM is more straightforward in interpretation. Arguing with classes instead of distributions of the population gives more scope for policy recommendations. Often CEs are conducted to inform policy makers. A clustering approach as in the LCM is easier to understand and leads to more straight forward results. It also allows the researcher to give names to the classes and segment the population into multidimensional interest groups (see e.g. [Meyerhoff et al., 2010]). Conclusively, I suggest - neglecting any assumptions on heterogeneity - that the LCM is a model that should be used for demonstrative purposes. In case a broad audience with limited statistical and economic background or policy 28

30 makers are the target group, the LCM is clearly more accessible and coherent. If a deeper analysis is required and the research purpose aims also towards methodological issues, the RPL is a fruitful challenge. Statistic lovers have more leeway and are free to experiment with new specifications. In fact, the RPL s flexibility is astonishing and further research necessary. The most recent advances tend to semi parametric distributions of the random parameters. [Fosgerau and Hess, 2008] and [Rouwendal et al., 2010] propose methods where no a priori assumptions on the distribution of random parameters is necessary. 7 Conclusion In this paper, I aimed to compare two models for discrete choice that incorporate heterogeneity in preferences, the RPL and the LCM. I discussed both models and pointed out their characteristics. The data used for the analysis came from a CE survey in Hyderabad, India and investigated consumer preferences for better electricity quality. The data is appropriate as it covers highly heterogeneous respondents ranging from very poor and traditional to rich and modern. A statistical comparison of the models was carried out using measures of fit, a test for non-linear and non-nested models, conditional WTP values and choice probabilities. The LCM shows some advantages in terms of statistical fitness and a smaller variance in conditional WTP values, yet it is not possible to derive a clear statement on which model performs 29

31 better. I finally discussed the two models taking into account further criteria than statistical success and provided some suggestions on when to use which model. The choice of the model depends, first, on the a priori assumptions on the kind of heterogeneity and, second, on the purpose of the study. However, it remains to the finger tip feel of the researcher to decide on a model. 30

32 References [Ben-Akiva and Swait, 1986] Ben-Akiva, M. and Swait, J. (1986). The akaike likelihood ratio index. Transportation Science, 20(2): [Birol et al., 2006] Birol, E., Karousakis, K., and Koundouri, P. (2006). Using a choice experiment to account for preference heterogeneity in wetland attributes: The case of cheimaditida wetland in greece. Ecological Economics, 60(1): [Carlsson and Martinsson, 2008] Carlsson, F. and Martinsson, P. (2008). Does it matter when a power outage occurs? a choice experiment study on the willingness to pay to avoid power outages. Energy Economics, 30(3): [Colombo et al., 2009] Colombo, S., Hanley, N., and Louviere, J. J. (2009). Modeling preference heterogeneity in stated choice data: an analysis for public goods generated by agriculture. Agricultural Economics, 40(3): [Fosgerau and Hess, 2008] Fosgerau, M. and Hess, S. (2008). Competing methods for representing random taste heterogeneity in discrete choice models. Mpra paper, University Library of Munich, Germany. [Greene, 2007] Greene, W. H. (2007). NLogit Version 4.0: Reference Guide. Econometric Software INC., Plainview, 1 edition. 31

33 [Greene and Hensher, 2003] Greene, W. H. and Hensher, D. A. (2003). A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37(8): [Hanisch et al., 2010] Hanisch, M., Kimmich, C., Rommel, J., and Sagebiel, J. (2010). Coping with power scarcity in an emerging megacity: A consumers perspective from hyderabad. International Journal of Global Energy Issues, 33(3&4): [Hensher and Greene, 2003] Hensher, D. and Greene, W. (2003). The mixed logit model: The state of practice. Transportation, 30(2): [Hynes et al., 2008] Hynes, S., Hanley, N., and Scarpa, R. (2008). Effects on welfare measures of alternative means of accounting for preference heterogeneity in recreational demand models. American Journal of Agricultural Economics, 90(4): [McFadden, 1974] McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Zarembka, P., editor, Frontiers in econometrics, pages Academic Press. [Meijer and Rouwendal, 2000] Meijer, E. and Rouwendal, J. (2000). Measuring welfare effects in models with random coefficients. [Meijer and Rouwendal, 2006] Meijer, E. and Rouwendal, J. (2006). Measuring welfare effects in models with random coefficients. Journal of Applied Economics, 21(2):

34 [Meyerhoff et al., 2010] Meyerhoff, J., Ohl, C., and Hartje, V. (2010). Landscape externalities from onshore wind power. Energy Policy, 38(1): [Morrison and Nalder, 2009] Morrison, M. and Nalder, C. (2009). Willingness to pay for improved quality of electricity supply across business type and location. The Energy Journal, 30(2): [Provencher and Bishop, 2004] Provencher, B. and Bishop, R. C. (2004). Does accounting for preference heterogeneity improve the forecasting of a random utility model? a case study. Journal of Environmental Economics and Management, 48(1): [Revelt and Train, 1998] Revelt, D. and Train, K. (1998). Mixed logit with repeated choices: Households choices of appliance efficiency level. Review of Economics and Statistics, 80(4): [Rommel et al., 2010] Rommel, K., Hanisch, M., Deb, K., and Sagebiel, J. (2010). Consumer preferences for improvements of power supply quality: Results from a choice experiment in Hyderabad, India and implications for energy policy: Paper presented at the European conference of the IAEE, August 25-28, 2010 in Vilnius, Lithuania. [Rouwendal et al., 2010] Rouwendal, J., de Blaeij, A., Rietveld, P., and Verhoef, E. (2010). The information content of a stated choice experiment: A new method and its application to the value of a statistical life. Transportation Research Part B: Methodological, 44(1):

35 [Sillano and de Dios Ortand, 2005] Sillano, M. and de Dios Ortand, J. (2005). Willingness-to-pay estimation with mixed logit models: some new evidence. Environment and Planning A, 37(3): [Torres et al., 2011] Torres, C., Colombo, S., and Hanley, N. (2011). Incorrectly accounting for taste heterogeneity in choice experiments: Does it really matter for welfare measurement? Stirling Economics Discussion Papers , University of Stirling, Division of Economics. 34

Uncovering Complexity-induced Status Quo Effects in Choice Experiments for Environmental Valuation

Uncovering Complexity-induced Status Quo Effects in Choice Experiments for Environmental Valuation Uncovering Complexity-induced Status Quo Effects in Choice Experiments for Environmental Valuation Malte Oehlmann*, Jürgen Meyerhoff, Priska Weller - Preliminary version - Abstract This study investigates

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

econstor zbw www.econstor.eu

econstor zbw www.econstor.eu econstor www.econstor.eu Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Hess, Stephane

More information

Freight transport distance and weight as utility conditioning effects on a stated choice experiment

Freight transport distance and weight as utility conditioning effects on a stated choice experiment Journal of Choice Modelling, 5(1), 2012, pp 64-76 www.jocm.org.uk Freight transport distance and weight as utility conditioning effects on a stated choice experiment Lorenzo Masiero 1,* David A. Hensher

More information

Advantages of latent class over continuous mixture of Logit models

Advantages of latent class over continuous mixture of Logit models Advantages of latent class over continuous mixture of Logit models Stephane Hess Moshe Ben-Akiva Dinesh Gopinath Joan Walker May 16, 2011 Abstract This paper adds to a growing body of evidence highlighting

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

Mortgage Loan Approvals and Government Intervention Policy

Mortgage Loan Approvals and Government Intervention Policy Mortgage Loan Approvals and Government Intervention Policy Dr. William Chow 18 March, 214 Executive Summary This paper introduces an empirical framework to explore the impact of the government s various

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Latent Class Regression Part II

Latent Class Regression Part II This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan asai@otsuma.ac.jp Abstract:

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston

More information

REBUTTAL TESTIMONY OF BRYAN IRRGANG ON SALES AND REVENUE FORECASTING

REBUTTAL TESTIMONY OF BRYAN IRRGANG ON SALES AND REVENUE FORECASTING BEFORE THE LONG ISLAND POWER AUTHORITY ------------------------------------------------------------ IN THE MATTER of a Three-Year Rate Plan Case -00 ------------------------------------------------------------

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

International Journal of Arts and Science Research Journal home page: www.ijasrjournal.com

International Journal of Arts and Science Research Journal home page: www.ijasrjournal.com Research Article ISSN: 2393 9532 International Journal of Arts and Science Research Journal home page: www.ijasrjournal.com JuneJuneJuneLEADER AND SUBORDINATE PERCEPTION ON LEADERSHIP PURCHASING PATTERNS

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics John Pepper Assistant Professor Department of Economics University of Virginia 114 Rouss

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Using Options Trading Data to Algorithmically Detect Insider Trading

Using Options Trading Data to Algorithmically Detect Insider Trading MS&E 444 Final Project Report Using Options Trading Data to Algorithmically Detect Insider Trading Instructor: Prof. Kay Giesecke TA: Benjamin Armbruster Group Members: 1 Youdan Li Elaine Ou Florin Ratiu

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

An Empirical Analysis of Paper Selection by Digital Printers

An Empirical Analysis of Paper Selection by Digital Printers An Empirical Analysis of Paper Selection by Digital Printers A Thesis Presented to The Academic Faculty By Benjamin Philipp Jonen In Partial Fulfillment Of the Requirements for the Degree Master of Science

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

How to Win the Stock Market Game

How to Win the Stock Market Game How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

Predicting the Performance of a First Year Graduate Student

Predicting the Performance of a First Year Graduate Student Predicting the Performance of a First Year Graduate Student Luís Francisco Aguiar Universidade do Minho - NIPE Abstract In this paper, I analyse, statistically, if GRE scores are a good predictor of the

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits Technical Paper Series Congressional Budget Office Washington, DC FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits Albert D. Metz Microeconomic and Financial Studies

More information

Statistical Innovations Online course: Latent Class Discrete Choice Modeling with Scale Factors

Statistical Innovations Online course: Latent Class Discrete Choice Modeling with Scale Factors Statistical Innovations Online course: Latent Class Discrete Choice Modeling with Scale Factors Session 3: Advanced SALC Topics Outline: A. Advanced Models for MaxDiff Data B. Modeling the Dual Response

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

How To Find Out If A Telephone Or Internet Survey Is More Effective

How To Find Out If A Telephone Or Internet Survey Is More Effective Journal of Choice Modelling, 4(2), pp 1-19 www.jocm.org.uk Properties of Internet and Telephone Data Collection Methods in a Stated Choice Value of Time Study Context Maria Börjesson 1,* Staffan Algers

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

An Analysis of the Effect of Income on Life Insurance. Justin Bryan Austin Proctor Kathryn Stoklosa

An Analysis of the Effect of Income on Life Insurance. Justin Bryan Austin Proctor Kathryn Stoklosa An Analysis of the Effect of Income on Life Insurance Justin Bryan Austin Proctor Kathryn Stoklosa 1 Abstract This paper aims to analyze the relationship between the gross national income per capita and

More information

Testing for Granger causality between stock prices and economic growth

Testing for Granger causality between stock prices and economic growth MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

More information

Risk Preferences and Demand Drivers of Extended Warranties

Risk Preferences and Demand Drivers of Extended Warranties Risk Preferences and Demand Drivers of Extended Warranties Online Appendix Pranav Jindal Smeal College of Business Pennsylvania State University July 2014 A Calibration Exercise Details We use sales data

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Chapter VIII Customers Perception Regarding Health Insurance

Chapter VIII Customers Perception Regarding Health Insurance Chapter VIII Customers Perception Regarding Health Insurance This chapter deals with the analysis of customers perception regarding health insurance and involves its examination at series of stages i.e.

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Online Survey Data Quality and its Implication for Willingness-to-Pay: A Cross- Country Comparison

Online Survey Data Quality and its Implication for Willingness-to-Pay: A Cross- Country Comparison Online Survey Data Quality and its Implication for Willingness-to-Pay: A Cross- Country Comparison Zhifeng Gao, Assistant Professor ( zfgao@ufl.edu ) Lisa House, Professor Jing, Xie, PhD student Food and

More information

The frequency of visiting a doctor: is the decision to go independent of the frequency?

The frequency of visiting a doctor: is the decision to go independent of the frequency? Discussion Paper: 2009/04 The frequency of visiting a doctor: is the decision to go independent of the frequency? Hans van Ophem www.feb.uva.nl/ke/uva-econometrics Amsterdam School of Economics Department

More information

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model Tropical Agricultural Research Vol. 24 (): 2-3 (22) Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model V. Sivapathasundaram * and C. Bogahawatte Postgraduate Institute

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Multiple Choice Models II

Multiple Choice Models II Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Social Security Eligibility and the Labor Supply of Elderly Immigrants. George J. Borjas Harvard University and National Bureau of Economic Research

Social Security Eligibility and the Labor Supply of Elderly Immigrants. George J. Borjas Harvard University and National Bureau of Economic Research Social Security Eligibility and the Labor Supply of Elderly Immigrants George J. Borjas Harvard University and National Bureau of Economic Research Updated for the 9th Annual Joint Conference of the Retirement

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

CREDIT SCORING MODEL APPLICATIONS:

CREDIT SCORING MODEL APPLICATIONS: Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Problem of the Month Through the Grapevine

Problem of the Month Through the Grapevine The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems

More information

Personal Savings in the United States

Personal Savings in the United States Western Michigan University ScholarWorks at WMU Honors Theses Lee Honors College 4-27-2012 Personal Savings in the United States Samanatha A. Marsh Western Michigan University Follow this and additional

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Influence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions

Influence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions Influence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions Bruce A. Babcock and Chad E. Hart Working Paper 05-WP 393 April 2005 Center for Agricultural and Rural Development Iowa State

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information