Estimating health state utility values from discrete choice experiments a

Similar documents

Well-being and the value of health

CALCULATIONS & STATISTICS

eq5d: A command to calculate index values for the EQ-5D quality-of-life instrument

Time to tweak the TTO: results from a comparison of alternative specifications of the TTO

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

II. DISTRIBUTIONS distribution normal distribution. standard scores

Centre for Central Banking Studies

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Least Squares Estimation

SUMAN DUVVURU STAT 567 PROJECT REPORT

APPLIED MISSING DATA ANALYSIS

1. INTRODUCTION...3 EUROQOL GROUP...3 EQ-5D...4 WHAT IS A HEALTH STATE?...7 VERSIONS OF EQ-5D...8

Permutation Tests for Comparing Two Populations

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Multiple Imputation for Missing Data: A Cautionary Tale

PS 271B: Quantitative Methods II. Lecture Notes

D-optimal plans in observational studies

What is a QALY? What is...? series. Second edition. Health economics. Supported by sanofi-aventis

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Basics of Statistical Machine Learning

Simple linear regression

A Basic Introduction to Missing Data

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Imputing Missing Data using SAS

Local outlier detection in data forensics: data mining approach to flag unusual schools

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

Sample Size and Power in Clinical Trials

BayesX - Software for Bayesian Inference in Structured Additive Regression

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Genetic Discoveries and the Role of Health Economics

Statistical Machine Learning

5. Multiple regression

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

EQ-5D-3L User Guide. Basic information on how to use the EQ-5D-3L instrument. Version 5.1 April Prepared by Mandy van Reenen / Mark Oppe

Markov Chain Monte Carlo Simulation Made Simple

Problem of Missing Data

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Interpretation of Somers D under four simple models

Standard errors of marginal effects in the heteroskedastic probit model

Fitting Subject-specific Curves to Grouped Longitudinal Data

4. Continuous Random Variables, the Pareto and Normal Distributions

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Handling attrition and non-response in longitudinal data

Java Modules for Time Series Analysis

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Nonparametric adaptive age replacement with a one-cycle criterion

Introduction to Fixed Effects Methods

Introduction to General and Generalized Linear Models

Ordinal Regression. Chapter

Multivariate Normal Distribution

The Method of Least Squares

Analyzing Structural Equation Models With Missing Data

Tutorial on Markov Chain Monte Carlo

Master s Theory Exam Spring 2006

Session 7 Bivariate Data and Analysis

Maximum Likelihood Estimation

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Penalized regression: Introduction

Marketing Mix Modelling and Big Data P. M Cain

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

The Probit Link Function in Generalized Linear Models for Data Mining Applications

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

Extreme Value Modeling for Detection and Attribution of Climate Extremes

An introduction to Value-at-Risk Learning Curve September 2003

Report on the Scaling of the 2014 NSW Higher School Certificate. NSW Vice-Chancellors Committee Technical Committee on Scaling

LOGISTIC REGRESSION ANALYSIS

Predict the Popularity of YouTube Videos Using Early View Data

Chapter 1 Introduction. 1.1 Introduction

2. Linear regression with multiple regressors

Association Between Variables

Module 3: Correlation and Covariance

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Joint models for classification and comparison of mortality in different countries.

LOGIT AND PROBIT ANALYSIS

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*:

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Analysis of Bayesian Dynamic Linear Models

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

What are health utilities?

Least-Squares Intersection of Lines

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

1 Teaching notes on GMM 1.

Introduction to Matrix Algebra

7 Time series analysis

Frictional Matching: Evidence from Law School Admission

Statistics in Retail Finance. Chapter 6: Behavioural models

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Transcription:

Estimating health state utility values from discrete choice experiments a QALY space model approach Yuanyuan Gu, Richard Norman, Rosalie Viney Centre for Health Economics Research and Evaluation, University of Technology, Sydney, Australia Corresponding author: Yuanyuan Gu, Centre for Health Economics Research and Evaluation, University of Technology, Sydney, PO BOX 23, Broadway, NSW, 27, Australia. E-mail: yuanyuan.gu@gmail.com Phone: +6 2 954 9886 Fax: +6 2 954 473 Keywords: Average valuation, Bayesian, DCE, EQ-5D, Johnson s SB, QALY space

Abstract Using discrete choice experiments (DCEs) to estimate health state utility values has become an important alternative to the standard methods such as the Time Trade-Off (TTO). Studies using DCEs have typically used the conditional logit to estimate the underlying utility function. We show that this approach will lead to the valuation of each health state from an average person in the population. By contrast, the standard approach that has been developed for the TTO method is based on estimating the average valuation for a health state within the population. These are fundamentally conceptually different approaches and have different interpretations in policy evaluation. In this paper we point out that it is also possible to estimate the average valuation for a health state when using DCEs. The estimation approach is based on the mixed logit (MIXL). In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingness-to-pay literature. These methods are applied to a data set collected using the EQ-5D. The results demonstrate that the preferred QALY space model provides lower estimates of the utility values than the conditional logit, with the divergence increasing with worsening health states. 2

Introduction For the evaluation of new health technologies, it is conventional to model their effect using the quality-adjusted life year (QALY). QALYs combine quality of life and life expectancy into a summary measure that reflects preferences for these two dimensions of health gain (Pliskin, et al., 98). The use of cost-utility analysis, with outcomes measured in terms of QALYs is now recommended by most health technology agencies internationally. A number of standard generic quality of life instruments have been developed for the purpose of measuring and valuing quality of life to facilitate estimation of QALYs directly from patient reported outcomes (Brazier, 27). These instruments, known as multi-attribute utility instruments describe the health state space in terms of several dimensions of quality of life, and include a preference based scoring algorithm that can be interpreted on a cardinal scale. Typically, standard preference based valuation techniques such as the Standard Gamble (SG) and Time Trade-off (TTO) have been used to derive the scoring algorithms to assign the scores (known as utility values or QALY weights) to the universe of health states described by the instrument. In the past decade, several authors have considered the use of discrete choice experiments (DCEs) to estimate health state utility values, as an alternative to TTO and SG based techniques (Bansback, et al., 22; Coast, et al., 28; Flynn, 2; Hakim and Pathak, 999; Lancsar, et al., 2; Ratcliffe, et al., 29; Ryan, et al., 26; Viney, et al., 23). In the approach developed by Bansback, et al. (22), and used by others, the health state utility values are estimated based on the conditional logit model. Broadly, in this approach, the conditional logit is used to estimate coefficients of the attributes that describe a health profile. Utility decrements associated with any move away from full health can be estimated for each dimension and level by computing the ratios between the estimated coefficients of the non- 3

time attributes and that of the time attribute. Utility values assigned to specific health states are then calculated by summing the relevant utility decrements and subtracting them from one. This approach has important conceptual differences from the approach that has been developed for the TTO and the SG. The standard approach that has been used in the QALY literature and in economic evaluation is based on finding the average valuation of a health state for the relevant population. Effectively this involves estimating the health state utility values for each individual in the population and then averaging these individual utility values over the whole population. In contrast, the approach using conditional logit is to find the valuation of a health status from an average person in the population. These are conceptually different approaches and therefore have potentially different interpretations in policy evaluation. In this paper we demonstrate that it is possible to estimate the average valuation for a health state when using DCEs. The estimation approach is based on the mixed logit (MIXL) which allows us to derive the population distributions of utility decrements and then the means of these distributions. In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingnessto-pay (WTP) literature. The QALY space model has several advantages over the preference space model and the most significant one is that it allows us to directly estimate and compare different distribution assumptions for the utility decrements. A specific contribution is made to the estimation of a QALY space model with utility decrements assumed to follow a multivariate Johnson s SB distribution. In the choice modelling literature this type of model has been very difficult to estimate due to an identification problem (Rigby and Burton, 26; Train and Sonnier, 25). In this paper we show that using informative priors on the bounds The time attribute (also called the survival duration attribute) describes the life expectancy in a health state. 4

may improve the identification and estimating the bounds with other parameters simultaneously is possible. In this study, we develop methods to estimate utility values for EQ-5D health states although these methods could be applied to other instruments that are based on a linear additive model, such as the SF-6D. These methods are applied to a data set which has been previously used to estimate health state utility values. The utility values estimated from the selected MIXL model and the conditional logit are compared. 2 Valuing EQ-5D health states using DCEs The EQ-5D, developed by the EuroQol Group, is the most widely used multi-attribute utility instrument (Richardson, et al., 2; Szende, et al., 27). It has five dimensions, intended to represent the major areas in which health changes can manifest: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. For the most commonly used version of the EQ-5D, each dimension contains three levels, loosely classified as No Problems, Some Problems, and Extreme Problems. Details are shown in Table. There are 3^5 = 243 potential states in the descriptive system. [Insert Table around here] The traditional approach to value these 243 states has been to administer a TTO preference based task for a sample of health states in a population based sample, and then use regression based modelling to impute the values of the remaining health states (Dolan, 997; Szende, et 5

al., 27; Viney, et al., 2). There is an extensive literature on this broad approach including a series of examinations on their limitations which might have led to the current trend of investigating alternative methods (Bosch, et al., 998; Craig, et al., 29; Norman, et al., 2). For example there have been explorations of alternative specifications of the TTO, including Lead-Time and Lag-Time TTOs (Devlin, et al., 2). A review of the development of using DCEs to value health states can be found in Bansback, et al. (22). 2. The DCE data Viney, et al. (23) have developed a DCE based algorithm for the Australian population, and the data from that study are used in the current analysis. This section briefly describes the experiment. The DCE was developed and administered to a sample of the Australian general population. Respondents were asked to choose between health profiles described in terms of EQ-5D profiles and survival attributes. Each choice set included three options: two health profile options and an immediate death option. Each health profile option in a choice set was defined by five attributes covering the dimensions of the EQ-5D and a survival duration attribute. Five survival durations (, 2, 4, 8 and 6 years) were included in the experiment. The third option of immediate death was included to allow for a complete ranking of health profiles over the worse than death to full health utility space. The task for the respondent was to identify which of the three options was considered the best, and which the worst, thus providing a complete ranking within each choice set. An example of a choice set is provided in Figure. [Insert Figure around here] 6

Details of the experimental design can be found in Viney, et al. (23). Although each choice set included an immediate death option, only the choice between the two non-death profiles was considered. 2 Therefore the analysis was based on a constructed choice set with only the rankings of these two profiles. A total of,2 individuals consented to participate in the survey and were eligible to participate. Of these, 3 completed they survey, giving a response rate of 92.%. Viney, et al. (23) showed that overall the characteristics of these who completed the task are broadly comparable to the characteristics of the general Australian population. Each respondent faced 5 choice sets, which translates into 5,465 observations. 3 Using conditional logit As Viney, et al. (23) and Bansback, et al. (22) both noted, an additive utility function with life expectancy and the levels of the EQ-5D would be inconsistent with the theoretical framework that underpins QALYs, because the QALY model requires that all health states have the same utility at death, i.e., as survival approaches zero, the systematic component of the utility function should similarly tend to zero. This satisfies the zero condition implicit in the QALY model (Bleichrodt and Johannesson, 997; Bleichrodt, et al., 997). Therefore, the utility of option in choice set for survey respondent is assumed to be U isj TIME X TIME, () isj isj isj isj 2 Flynn, et al. (28) argues that including the immediate death option in the choice modelling violates the random utility theory as some respondents may always choose survival over death no matter what health profiles are provided to them. 7

where represents a set of dummy variables relating to the levels of the EQ-5D health state, represents survival, and the error term are i.i.d. Gumbel distributions. It is conventional to use the best level of each dimension as the reference category. In this case excludes the dummies representing the best levels with other elements remaining: MO2, MO3, SC2, SC3, UA2, UA3, PD2, PD3, AD2, and AD3. For example, a health state denoted as 222 should translate into a vector (,,,,,,,,, ). In the current literature, the α and β terms have been assumed to be constant across individuals and based on this assumption equation () leads to the conditional logit model. 3 It is our baseline model and denoted as M. The estimation of α and β does not directly lead to the valuation of health states. An approach is needed to anchor the latent utility scale to the health state utility scale. There are several ways to derive this algorithm (Bansback, et al., 22; Ratcliffe, et al., 29; Viney, et al., 23). The main idea is that the utility value of a health state is its marginal utility of TIME on the latent scale, i.e., U TIME X. In the case of full health, its marginal utility of TIME on the latent scale is U TIME, which needs to be normalised to be under the QALY model. Hence the normalising constant is α and the utility score for a health state is 3 Viney, et al. (23) assume the error term to be normal. In this case equation () indicates a probit model. 8

X. The utility decrements are therefore β α. 3. Average valuation versus an average person s valuation As noted by Bansback, et al. (22), the objective was to derive the population mean utility scores for all possible health states, which requires estimation of population mean utility decrements. The conditional logit parameter estimates and represent population mean preferences for attributes that describe a health profile. In effect, the estimate represents an average person in the population whose preference parameters are exactly and. In this case, is actually the estimate of this average person s utility decrements. This is conceptually different from the population mean utility decrement which would be estimated by deriving, for each person in the target population for the person s and, and using this to calculate that individual's utility decrements. The population mean utility decrements are then computed as the average of all the individual decrements. Mathematically, this procedure can be described as which may or may not be close to, i.e., the ratio of means (. It is worth noting that when the TTO approach is used, this issue does not arise. When using TTO, a sample of health states are selected and respondents utility scores for these health states are elicited. These scores are then used as the dependent variable in a model which is 9

regressed on. In this case the regression coefficients, representing population mean utility decrements, are directly estimated using least squares (Dolan, 997; Viney, et al., 2). 4 Using MIXL: preference space versus QALY space One possible way to estimate the population mean utility decrements is to use a framework based on random parameters. Equation () can be rewritten as U isj TIME i isj X TIME i isj isj isj (2) where and are both random. The induced model is called the MIXL. Under this framework, we first estimate the distributions of (i.e., the distributions of utility decrements) and then derive the means of these distributions. To find the distribution of the ratio of two random variables is a longstanding problem. It has been particularly investigated in the WTP literature where represents the coefficient of price and represent the coefficients of non-price attributes in a DCE. Hensher and Greene (23) and Daly, et al. (22) discussed the major challenges in this area of research. The first challenge is that may not have finite moments unless is assumed to have some specific distributions such as log-normal. In our case, to assume to be a log-normal random variable is reasonable because represents a person s preference for the duration of life at perfect health condition and should be always positive. The second challenge concerns the extreme values that arise from the reciprocal of a random variable. As long as can take very small values, / will produce quite large numbers.

This problem is increasingly acute when s distribution has thick tails (e.g., student t and log-normal). We therefore estimated two MIXL models: M2.: log and follow multivariate normal distribution with mean μ and variance ; M2.2: log and log follow multivariate normal distribution with mean μ and variance. 4 The second model (M2.2) has the advantage of assuring the decrements distributions are strictly negative and the disadvantage of inducing a lot of extreme values. In contrast, the first model (M2.) may suffer less from the extreme values but it cannot guarantee each individual s utility decrements are strictly negative. Another challenge that has not been addressed in the literature is that the distribution of is induced from our assumptions on the distributions of and so it is not possible to directly compare and test the distributions of. In the WTP literature, alternative methods have been developed to meet these challenges (Daly, et al., 22). Among them the most promising effort has been the invention of the WTP space model (Train and Weeks, 25). The name WTP space was proposed as a contrast to the preference space on which the framework described above is based. The WTP space model is essentially a reparameterisation of equation (2) so that the distribution of can be directly assumed and estimated. We adapted this idea to our context and named the approach QALY space model. We now re-parameterise equation (2) as 4 For estimating M2.2 we need to change the signs of the data corresponding to to their opposite. This applies to other models when log-normal distribution is assumed for negative coefficients.

U isj i TIMEisj i X isjtimeisj isj (3) where /. Under this new framework we may estimate and compare models that assume different distributions on the utility decrements i. For the EQ-5D DCE data, we estimated three models: M3.: log( and follow multivariate normal distribution with mean μ and variance ; M3.2: log( and log( ) follow multivariate normal distribution with mean μ and variance ; M3.3: log( and log( follow multivariate normal distribution with mean μ and variance, where represents the size of (i.e., the number of utility decrements) and represents a positive unknown scalar parameter. The model M3. and M3.2 assume normal and log-normal distributions for the utility decrements respectively. Both have merits and flaws; the normal distribution has thin tails but cannot ensure everyone has negative decrements while the log-normal distribution is the opposite it can ensure everyone has negative decrements but has a thick right tail that may lead to very large mean estimates. Model M3.3 assumes Johnson s SB distribution for the utility decrements, i.e., exp / exp ) (4) where is normally distributed. It is a special case of Johnson s SB distribution with the lower bound set as and the upper bound to be estimated. 5 This distribution has both merits of normal and log-normal: thin tail and only taking positive numbers. Literature also 5 As the log-normal case, we changed the signs of the data corresponding to to their opposite. Therefore, a decrement s distribution should have a lower bound - and an upper bound. 2

shows that a wide variety of distributions such as normal, log-normal, Weibull, and modified beta can be satisfactorily fitted by the Johnson s SB distribution (Yu and Standish, 99). Moreover, it has been shown that Johnson s SB distribution can accommodate data with two modes spiked at the lower and upper bounds (Rigby and Burton, 26). Based on these evidences we expected M3.3 to be the best modelling strategy, especially given that we have limited prior knowledge on the shape of the distributions of utility decrements. 5 Estimation and model comparison The most popular methods for estimating MIXL are simulated maximum likelihood (SML) and Bayesian. Each has relative merits (Regier, et al., 29; Train, 23). The SML method is widely used, as most econometric and statistic software have developed standard routines to estimate MIXL based on this method. 6 However, the Bayesian approach has several clear advantages that suit our case. First, we assume all the random coefficients are correlated which leads to the estimation of a large covariance matrix. The SML method can be very time consuming in this case. And even with large number of simulation draws, convergence is not always guaranteed. In contrast, the Bayesian approach estimates correlated MIXL and uncorrelated MIXL at almost the same speed (Train, 23). Second, the SML method cannot estimate M3.3 without fixing the bounds while the Bayesian approach may estimate the bounds and other parameters simultaneously by using informative priors (we will show this in a moment). Therefore, in this study we chose to use the Bayesian method to estimate all the models including the conditional logit which is a special case of MIXL with its set as 6 For example, in STATA the mixlogit routine (Hole, 27) can be used to estimate the MIXL models in preference space while the gmnl rountine (Gu, et al., 23) can be modified to estimate MIXL models in QALY space or WTP space (Fiebig, et al., 2; Greene and Hensher, 2; Hole and Kolstad, 22). 3

an empty matrix. The sampling scheme for estimating the MIXL models in preference space was given in Train (23). The Matlab code written by Kenneth Train was used. 7 It is also straightforward to estimate the MIXL models in QALY space including M3. and M3.2; only a slight modification of the likelihood function is needed. The challenge comes from M3.3. As Train and Sonnier (25) pointed out, in equation (4) the bound parameter is closely related to the variance of and thus the model is under-identified. In the choice modelling literature, this under-identification is usually solved by fixing the s at a series of constants and then selecting the model with the best log-likelihood estimate. This approach is called grid search. The grid search method works well on the univariate case but for the multivariate situation it can be extremely laborious (Rigby and Burton, 26). In our case, we have a dimension multivariate Johnson s SB distribution and to identify the optimal point on the -D space is computationally infeasible. It is therefore necessary to seek an alternative solution. Our approach was based on using informative prior distributions on the bounds so that Bayesian identifiability of the model can be obtained. 8 The priors were log-normal distributions, constructed based on the estimates from M3.2. More specifically, the chosen priors cover the largest 99 th percentile of the log-normal distributions estimated from M3.2, a reasonable assumption of the upper bound of the bound parameter. The bound parameters were sampled as a vector using the random walk Metropolis-Hasting algorithm. 9 In order to confidently use the post burn-in iterates for inference, it is necessary to check that the sampling scheme has converged. We judged convergence visually by running the sampling scheme from three different initial positions and plotted various functionals of the 7 Available from http://elsa.berkeley.edu/~train/software.html 8 The mechanism is explained in detail in Scheines, et al. (999). 9 In the sampling, we first draw log( ) and then take its exponential. 4

iterates on the same graph. Successful convergence was indicated by the overlap of the functionals from the three chains. Following Train (23), we adopted a frequentist interpretation of the Bayesian estimates, i.e., the posterior means and standard deviations were used as the point estimates and standard errors. The decrements distributions were simulated using, random draws. The log-likelihood was calculated at the point estimates using, random draws. We also used AIC as the criterion for model comparisons. We did not use BIC as it penalises sample size heavily and thus, for very large sample sizes such as in this case, it is less informative in distinguishing between models that involve additional parameters. 6 Results 6. Estimation of Conditional logit (M) The parameter estimates of the conditional logit model were given in Table 2. Utility decrements based on were reported in the last column of Table 2. The interpretation of these numbers is that they represent an average person s utility decrements. [Insert Table 2 around here] We also used AICc which penalizes sample size, but due to the large sample size, AICc is almost identical to AIC. 5

6.2 Estimation of MIXL using preference space (M2. and M2.2) The parameter estimates of the two MIXL models using preference space were given in Table 3 (M2.) and Table 4 (M2.2). Base on log-likelihood and AIC, both models were substantially better than M. M2.2 also completely dominated M2. in terms of model fit indicating that the log-normal distribution assumption on accommodated the data much better than the normal distribution assumption. [Insert Table 3 around here] [Insert Table 4 around here] Based on these parameter estimates, the distributions of were simulated. The means of these distributions were reported in the tables as the population mean estimates of utility decrements. By comparing these two sets of estimates with the estimates from M, we found that for the size of level 2 decrements (e.g., MO2), overall M2.2 M M2.. For the size of level 3 decrements (e.g., MO3), overall M2.2 M2. M. The differences for the level 3 decrements were particularly significant. To understand these differences, we plotted these simulated distributions of in Figure 2 (for M2.) and Figure 3 (for level 3 decrements from M2.2). [Insert Figure 2 around here] 6

[Insert Figure 3 around here] From Figure 2 we can see that all the distributions from M2. have a significant proportion of the distribution greater than zero. This was particularly the case for the level 2 decrements. Given the EQ-5D is designed to be monotonic (level 2 is necessarily worse than level in each dimension), this is a concern. This also explains why the mean decrements for level 2 decrements from M2. were clearly smaller than the estimates from other two models. Another finding is that extreme values existed on both tails. If these extreme values spread out evenly on both sides the mean estimates would not be affected but unfortunately it is not the case. As shown in Figure 3, the problem of outliers is more severe in M2.2. All the distributions have very thick right tails indicating the population mean estimates are in fact determined by a group of extreme individuals. These extreme people may or may not exist in the real world, and it is questionable whether, in the policy making context the resulting valuations of health states should be driven by their valuations. To correct for this concern, a reasonable approach is to drop the % or 2% most extreme values from the simulated data (Daly, et al., 22; Hensher and Greene, 23). In Figure 3, we plotted the decrements distributions again after discarding the 2% most extreme values. They appeared to have much thinner tails. We also re-calculated the means which were reported in the last column of Table 4. The level 3 decrements mean estimates are now very close to those from M2. but still significantly larger than those from M. 7

6.3 Estimation of MIXL using QALY space (M3., M3.2, and M3.3) The parameter estimates of the first two MIXL models using QALY space were given in Table 5 (M3.) and Table 6 (M3.2). Base on log-likelihood and AIC, M3.2 was superior to M3. in terms of model fit indicating that the log-normal distribution assumption on the utility decrements was superior to the normal distribution assumption. Indeed, under M3., some decrements estimated distributions had substantial proportions greater than zero, which potentially led to the underestimation of these mean decrements. In the case of UA2, the sign clearly violates the monotonic condition. [Insert Table 5 around here] [Insert Table 6 around here] Another interesting comparison is M2.2 versus M3.2. The two models had very similar model fit with the latter slightly better. They also produced very similar utility decrements distributions indicating that whilst the distribution of from M2.2 is not in closed form it is in fact very close to log-normal distribution. The parameter estimates of the final model M3.3 were given in Table 7. When estimating the model we used informative prior distributions on all the bounds: ~, σ ) where σ was chosen as.6.,.36) covers a range from.25 to 4 (the st and 99 th percentiles). The 99 th percentiles of the log-normal distributions estimated from M3.2 (the smallest.8 and the largest 3.64) all locate well in this range. 8

Base on log-likelihood and AIC, M3.3 dominated M3.2, confirming that Johnson s SB is indeed a better distribution than log-normal for describing the utility decrement s distribution. We plotted the estimated distributions from both models in Figure 4 which clearly demonstrates Johnson s SB s advantage over log-normal: its shape is very close to lognormal but has a very thin tail. Unsurprisingly, the mean decrement estimates from this model were close to those from M2.2 and M3.2 where extreme values were discarded. [Insert Table 7 around here] [Insert Figure 4 around here] 7 Discussions and conclusions This study explored different estimation methods to provide estimates of the health state utility values that take better account of the individual heterogeneity in EQ-5D data that have been obtained using DCEs. This is important not only because previous methods do not exploit any of the individual heterogeneity in the raw data, but also because the methods for estimating health state utility values from DCE data need to model explicitly the variance as well as the means of the model parameters to provide population mean estimates of the health state utility values. In this paper we have argued that the previous methods that did not model variance such as conditional logit essentially derive an average person s valuation which is conceptually different from the average valuation from the population, the standard approach used in 9

TTO studies. The paper has developed methods to derive an average valuation from the population using DCE data. This average valuation is then more comparable with the TTO approach. Our methods were based on the MIXL framework and two types of models were proposed in this paper. The first is preference space modelling, which derives the distribution of utility decrements by taking the ratio of random variables. A significant problem associated with this approach is that the distributions are induced from our assumptions on these random parameters and so it is difficult to directly compare these induced distributions. For example, in our empirical analysis, we showed that M2.2 did have better model fit than M2.. However, it did not translate into a fact that the mean decrement estimates from the former model were more reasonable than those from the latter. In fact, the estimates from M2.2 were severely affected by extreme values as the induced distributions had very thick right tails. Dropping these extreme values would make the mean estimates more robust but the choice of the appropriate point of truncation is arbitrary. The second approach is based on an adaptation of methods developed in the WTP literature to deal with the drawbacks of preference space models. We have adapted the WTP space model to develop the second type of model in our analysis that is the QALY space model. It is essentially a re-parameterization of the preference space model so that the decrements distributions can be estimated and compared directly. In the empirical analysis we tried three different distribution assumptions for the utility decrements: normal, log-normal, and Johnson s SB. The last of these provided the best model fit. Our analysis showcased the advantages of Johnson s SB distribution over the normal and lognormal distributions the most commonly used ones in choice modelling practice. Johnson s SB distribution has not been widely used since it was first introduced to the choice modelling 2

literature by Train and Sonnier (25). The major reason may be the difficulty of its estimation which often needs an extensive search of the bounds. In this paper, we showed that it is also possible to estimate the bounds by using informative priors on them. In the empirical analysis, we identified plausible priors from a model using log-normal assumptions whose estimation showed that the bounds are likely to be smaller than 3.64. Based on this, the prior distribution was constructed as, σ ) where σ was set as.6. We also did sensitivity analysis by changing σ and found that other values between.5 and would lead to similar results but the convergence of the model became harder as σ increases. By comparing the mean decrement estimates from M3.3 with the estimates from the conditional logit model, we found that the latter appeared to have smaller sizes. The largest differences happened to the level 3 decrements, in particular, MO3 and AD3. It is worth mentioning that when we estimated the conditional logit we did not impose any constraints while for M3.3 we imposed a monotonic constraint on each dimension of the EQ-5D. To explore the impact of doing so, we re-estimated the conditional logit with its β constraint to be negative (i.e. to impose monotonicity), and doing so did not change the parameter estimates at all. In Figure 5 we plotted the predicted values for all 243 health states described by the EQ-5D using estimates from M and M3.3. The ranking of the 243 health states from left to right is based on the predictions from the conditional logit approach. From the graph we can see that the conditional logit provides higher estimates of the utility values for almost all health states, with the divergence increasing with worsening health states. [Insert Figure 5 around here] 2

The DCEs offer a valuable alternative approach to the estimation of utility values, and is an area with an increasing international profile. In particular, it can be argued that the task is less onerous for respondents. However, the methods for analysing the data, and then for translating the result into an algorithm for use in economic evaluation remain contentious. We believe that the QALY space model approach outlined in this work represents a sensible way of using these data for this purpose, and should be explored using other generic quality of life instruments. Reference Bansback N, Brazier J, Tsuchiya A, Anis A. 22. Using a discrete choice experiment to estimate health state utility values. Journal of Health Economics 3: 36-38. Bleichrodt H, Johannesson M. 997. The validity of qalys: An experimental test of constant proportional tradeoff and utility independence. Medical Decision Making 7: 2-32. Bleichrodt N, Wakker P, Johannesson M. 997. Characterizing qalys by risk neutrality. Journal of Risk and Uncertainty 5: 7-4. Bosch JL, Hammitt JK, Weinstein MC, Hunink MG. 998. Estimating general-population utilities using one binary-gamble question per respondent. Medical Decision Making 8: 38-39. Brazier J. 27. Measuring and valuing health benefits for economic evaluation. Oxford University Press: Oxford ; New York. Coast J, Flynn TN, Natarajan L, Sproston K, Lewis J, Louviere JJ, et al. 28. Valuing the icecap capability index for older people. Social Science & Medicine 67: 874-882. Craig BM, Busschbach JJ, Salomon JA. 29. Keep it simple: Ranking health states yields values similar to cardinal measurement approaches. J Clin Epidemiol 62: 296-35. Daly A, Hess S, Train K. 22. Assuring finite moments for willingness to pay in random coefficient models. Transportation 39: 9-3. 22

Devlin NJ, Tsuchiya A, Buckingham K, Tilling C. 2. A uniform time trade off method for states better and worse than dead: Feasibility study of the 'lead time' approach. Health Economics 2: 348-36. Dolan P. 997. Modeling valuations for euroqol health states. Medical Care 35: 95-8. Fiebig DG, Keane MP, Louviere J, Wasi N. 2. The generalized multinomial logit model: Accounting for scale and coefficient heterogeneity. Marketing Science 29: 393-42. Flynn TN. 2. Using conjoint analysis and choice experiments to estimate qaly values: Issues to consider. Pharmacoeconomics 28: 7-722. Flynn TN, Louviere JJ, Marley AA, Coast J, Peters TJ. 28. Rescaling quality of life values from discrete choice experiments for use as qalys: A cautionary tale. Population Health Metrics 6. Greene WH, Hensher DA. 2. Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation 37: 43 428. Gu Y, Hole AR, Knox S. 23. Estimating the generalized multinomial logit model in stata. The Stata Journal in press. Hakim Z, Pathak DS. 999. Modelling the euroqol data: A comparison of discrete choice conjoint and conditional preference modelling. Health Economics 8: 3-6. Hensher DA, Greene WH. 23. The mixed logit model: The state of practice. Transportation 3: 33-76. Hole AR. 27. Fitting mixed logit models by using maximum simulated likelihood. The Stata Journal 7: 388-4. Hole AR, Kolstad JR. 22. Mixed logit estimation of willingness to pay distributions: A comparison of models in preference and wtp space using data from a health-related choice experiment. Empirical Economics 42: 445-469. Lancsar E, Wildman J, Donaldson C, Ryan M, Baker R. 2. Deriving distributional weights for qalys through discrete choice experiments. Journal of Health Economics 3: 466-478. Norman R, King MT, Clarke D, Viney R, Cronin P, Street D. 2. Does mode of administration matter? Comparison of online and face-to-face administration of a time tradeoff task. Qual Life Res 9: 499-58. Pliskin JS, Shepard DS, Weinstein MC. 98. Utility-functions for life years and healthstatus. Operations Research 28: 26-224. Ratcliffe J, Brazier J, Tsuchiya A, Symonds T, Brown M. 29. Using dce and ranking data to estimate cardinal values for health states for deriving a preference-based single index from the sexual quality of life questionnaire. Health Economics 8: 26-276. 23

Regier DA, Ryan M, Phimister E, Marra CA. 29. Bayesian and classical estimation of mixed logit: An application to genetic testing. Journal of Health Economics 28: 598-6. Richardson J, McKie J, Bariola E. Review and critique of health related multi attribute utility instruments. Centre for Health Economcs, Monash University, 2. Rigby D, Burton M. 26. Modeling disinterest and dislike: A bounded bayesian mixed logit model of the uk market for gm food. Environmental and Resource Economics 33: 485-59. Ryan M, Netten A, Skatun D, Smith P. 26. Using discrete choice experiments to estimate a preference-based measure of outcome--an application to social care for older people. Journal of Health Economics 25: 927-944. Scheines R, Hoijtink H, Boomsma A. 999. Bayesian estimation and testing of structural equation models. PSYCHOMETRIKA 64: 37-52. Szende A, Oppe M, Devlin N, editors. Eq-5d value sets: Inventory, comparative review and user guide. Dordrecht, The Netherlands: Springer, 27. Train K. 23. Discrete choice methods with simulation. Cambridge University Press: New York. Train K, Sonnier G. Mixed logit with bounded distributions of correlated partworths. In: Scarpa R, Alberini A, editors. Applications of simulation methods in environmental and resource economics.. Dordrecht, The Netherlands: Springer Publisher, 25:7-34 Train K, Weeks M. Discrete choice models in preference space and willingness-to-pay space. In: Scarpa R, Alberini A, editors. Applications of simulation methods in environmental and resource economics.. Dordrecht, The Netherlands: Springer Publisher, 25:-6. Viney R, Norman R, Brazier J, Cronin P, King M, Ratcliffe J, et al. 23. An australian discrete choice experiment to value eq-5d health states. Health Economics in press. Viney R, Norman R, King MT, Cronin P, Street DJ, Knox S, et al. 2. Time trade-off derived eq-5d weights for australia. Value Health 4: 928-936. Yu AB, Standish N. 99. A study of particle size distribution. Powder Technology 62: - 8. 24

Table. The EQ-5D instrument Dimension Level Description Mobility (MO) I have no problem in walking about 2 I have some problems in walking about 3 I am confined to bed Self-Care (SC) I have no problems with self-care 2 I have some problems washing and dressing myself 3 I am unable to wash and dress myself Usual Activities (UA) I have no problems with performing my usual activities 2 I have some problems with performing my usual activities 3 I am unable to perform my usual activities Pain / Discomfort (PD) Anxiety / Depression (AD) I have no pain or discomfort 2 I have moderate pain or discomfort 3 I have extreme pain or discomfort I am not anxious or depressed 2 I am moderately anxious or depressed 3 I am extremely anxious or depressed 25

Table 2. Conditional logit (M) Parameters Utility decrements Attributes Estimate (S.E.) Levels Time.27 (.7) MO2*Time -.3 (.4) MO2 -.2 MO3*Time -.4 (.4) MO3 -.52 SC2*Time -.3 (.5) SC2 -.2 SC3*Time -.8 (.5) SC3 -.29 UA2*Time -.3 (.5) UA2 -. UA3*Time -.5 (.5) UA3 -.9 PD2*Time -.3 (.4) PD2 -. PD3*Time -.3 (.4) PD3 -.5 AD2*Time -.4 (.4) AD2 -.4 AD3*Time -. (.4) AD3 -.37 Log-likelihood -892 No. of parameters AIC 7862 26

Table 3. MIXL using preference space: ~Log-normal and ~Normal (M2.) Parameters Utility decrements Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D. Time.53 (.4).79 (.4) MO2*Time -.9 (.3).48 (.3) MO2 -..53 MO3*Time -. (.5).79 (.4) MO3 -.68.85 SC2*Time -.2 (.4).59 (.4) SC2 -.9.64 SC3*Time -.6 (.4).7 (.4) SC3 -.35.78 UA2*Time -.5 (.3).53 (.3) UA2 -.4.58 UA3*Time -.4 (.4).59 (.4) UA3 -.2.63 PD2*Time -.6 (.3).47 (.3) PD2 -.6.5 PD3*Time -.3 (.5).78 (.4) PD3 -.65.84 AD2*Time -.25 (.3).5 (.3) AD2 -..55 AD3*Time -.86 (.4).79 (.4) AD3 -.49.82 Log-likelihood -786 No. of parameters 77 AIC 5786 27

Table 4. MIXL using preference space: ~Log-normal and ~Log-normal (M2.2) Parameters Utility decrements Attributes Mean (S.E.) S.D. (S.E.) Levels Original (S.D.) Truncated (S.D.) Time -.2 (.8).63 (.) MO2*Time -2.78 (.24).72 (.23) MO2 -.3 (.23) -. (.2) MO3*Time -.64 (.8).53 (.9) MO3 -.77 (.79) -.7 (.56) SC2*Time -2.69 (.25).6 (.2) SC2 -.5 (.3) -.2 (.5) SC3*Time -.43 (.).56 (.2) SC3 -.42 (.6) -.36 (.36) UA2*Time -3.23 (.35).92 (.3) UA2 -. (.22) -.8 (.) UA3*Time -.87 (.4).7 (.5) UA3 -.25 (.3) -.22 (.2) PD2*Time -3.3 (.28).88 (.22) PD2 -. (.23) -.9 (.) PD3*Time -.74 (.8).64 (.) PD3 -.73 (.82) -.65 (.56) AD2*Time -2.55 (.2).99 (.9) AD2 -.4 (.22) -.2 (.3) AD3*Time -.4 (.).82 (.) AD3 -.55 (.64) -.49 (.43) Log-likelihood -7548. No. of parameters 77 AIC 525 28

Table 5. MIXL using QALY space: ~Log-normal and ~Normal (M3.) Parameters Utility decrements Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D. Time.32 (.2).84 (.4) MO2*Time -.7 (.2).37 (.2) MO2 -.7.37 MO3*Time -.77 (.3).6 (.3) MO3 -.77.6 SC2*Time -.3 (.3).42 (.2) SC2 -.3.42 SC3*Time -.33 (.3).54 (.3) SC3 -.33.54 UA2*Time.3 (.3).39 (.2) UA2.3.39 UA3*Time -.6 (.3).43 (.2) UA3 -.6.43 PD2*Time -.4 (.2).37 (.2) PD2 -.4.37 PD3*Time -.7 (.3).59 (.3) PD3 -.7.59 AD2*Time -.8 (.2).37 (.2) AD2 -.8.37 AD3*Time -.53 (.3).56 (.3) AD3 -.53.56 Log-likelihood -776 No. of parameters 77 AIC 5586 29

Table 6. MIXL using QALY space: ~Log-normal and ~Log-normal (M3.2) Parameters Utility decrements Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D. Time.2 (.8).68 (.) MO2*Time -2.74 (.23).5 (.4) MO2 -.3.2 MO3*Time -.63 (.4).83 (.5) MO3 -.76.75 SC2*Time -2.62 (.2).2 (.4) SC2 -.5.27 SC3*Time -.4 (.8).3 (.7) SC3 -.42.57 UA2*Time -3.5 (.33).27 (.7) UA2 -..9 UA3*Time -.83 (.).9 (.8) UA3 -.24.27 PD2*Time -2.98 (.25).26 (.4) PD2 -..22 PD3*Time -.74 (.5).9 (.5) PD3 -.72.8 AD2*Time -2.47 (.7). (.) AD2 -.4.9 AD3*Time -.2 (.6).9 (.6) AD3 -.54.6 Log-likelihood -7545 No. of parameters 77 AIC 5244 3

Table 7. MIXL using QALY space: ~Log-normal and ~Johnson s SB (M3.3) Parameters Utility decrements Attributes Mean (S.E.) S.D. (S.E.) Bound (S.E.) Levels Mean S.D. Time.3 (.8).66 (.9) MO2*Time -2.73 (.63) 2.28 (.7) -.84 (.39) MO2 -.4.9 MO3*Time -.2 (.32).54 (.2) -.36 (.22) MO3 -.63.37 SC2*Time -3.9 (.72) 2.95 (.97) -.88 (.27) SC2 -.6.23 SC3*Time -.2 (.33) 2.48 (.62) -.97 (.8) SC3 -.36.32 UA2*Time -4.43 (.92) 2.95 (.82) -.9 (.2) UA2 -.9.8 UA3*Time -.45 (.43) 2.2 (.7) -.79 (.2) UA3 -.23.23 PD2*Time -2.87 (.85) 2.79 (.6) -.57 (.23) PD2 -..5 PD3*Time -.59 (.35).42 (.2) -.53 (.32) PD3 -.59.38 AD2*Time -2.68 (.62) 2.94 (.89) -.62 (.) AD2 -.4.8 AD3*Time -.42 (.35).32 (.7) -.9 (.58) AD3 -.47.38 Log-likelihood -7498 No. of parameters 87 AIC 57 3

Figure. An example choice set 32

Figure 2. Kernel densities of utility decrements estimated from the preference space model using normal distribution assumption (M2.) 2 MO2 MO3.5 - -5 5 5 SC2 2 - -5 5 5 2 25 3 35 UA2 2-2 -5 - -5 5 5 2 PD2 2 - -5 5 5 2 25 AD2 2 - -5 5 5 2-2 - 2 3 4 SC3.5.5-2 - 2 3 4 5 6 UA3.5.5-2 -5 - -5 5 5 2 PD3.5-25 -2-5 - -5 5 5 AD3.5-5 - -5 5 5 2 25 3 The left panel displays the kernel densities of level 2 decrements and the right panel displays the kernel densities of level 3 decrements. All densities were estimated using, random draws. 33

Figure 3. Kernel densities of utility decrements estimated from the preference space model using log-normal distribution assumption (M2.2) 2 No truncation 2 2% truncation 2-5 - -5 MO3-3.5-3 -2.5-2 -.5 - -.5 MO3 4 2-3 -25-2 -5 - -5 SC3 5-2.5-2 -.5 - -.5 SC3 5-2 - -8-6 -4-2 UA3 2-25 -2-5 - -5 PD3 2-25 -2-5 - -5 AD3 -.2 - -.8 -.6 -.4 -.2 UA3 2-3.5-3 -2.5-2 -.5 - -.5 PD3 2-2.5-2 -.5 - -.5 AD3 The left panel displays the kernel densities of level 3 decrements estimated using, random draws and the right panel displays the kernel densities of level 3 decrements estimated using these random draws with the smallest 2% discarded. 34

Figure 4. Utility decrements distributions estimated from two QALY space models 2 MO2 -.9 -.8 -.7 -.6 -.5 -.4 -.3 -.2 -. SC2 3 2 -.9 -.8 -.7 -.6 -.5 -.4 -.3 -.2 -. UA2 5 -.9 -.8 -.7 -.6 -.5 -.4 -.3 -.2 -. PD2 4 2 -.7 -.6 -.5 -.4 -.3 -.2 -. AD2 3 2 -.7 -.6 -.5 -.4 -.3 -.2 -..5.5 MO3 -.4 -.2 - -.8 -.6 -.4 -.2 SC3 3 2 - -.8 -.6 -.4 -.2 UA3 6 4 2 -.8 -.7 -.6 -.5 -.4 -.3 -.2 -. PD3.5.5 -.6 -.4 -.2 - -.8 -.6 -.4 -.2 AD3 2-2 -.5 - -.5 The solid lines represent the distributions estimated from the QALY space model using log-normal distribution assumption (M3.2) and the dotted lines represent the distributions estimated from the QALY space model using Johnson s SB distribution assumption (M3.3). The estimated log-normal distributions were all projected to the negative real line. 35

Figure 5. Predicted EQ-5D health state utility values.5 Utility Values -.5 - -.5 5 5 2 25 Health State The solid line represents the predictions from the conditional logit (M) and the dotted line represents the predictions from the preferred QALY space model (M3.3). The ranking of the 243 health states from left to right is based on the predictions from the conditional logit. 36