ORIGINAL ARTICLE US Valuation of the EQ-5D Health States Development and Testing of the D1 Valuation Model James W. Shaw, PhD, PharmD, MPH,* Jeffrey A. Johnson, PhD, and Stephen Joel Coons, PhD Purpose: The EQ-5D is a brief, multiattribute, preference-based health status measure. This article describes the development of a statistical model for generating US population-based EQ-5D preference weights. Methods: A multistage probability sample was selected from the US adult civilian noninstitutional population. Respondents valued 13 of 243 EQ-5D health states using the time trade-off (TTO) method. Data for 12 states were used in econometric modeling. The TTO valuations were linearly transformed to lie on the interval 1, 1. Methods were investigated to account for interaction effects caused by having problems in multiple EQ-5D dimensions. Several alternative model specifications (eg, pooled least squares, random effects) also were considered. A modified split-sample approach was used to evaluate the predictive accuracy of the models. All statistical analyses took into account the clustering and disproportionate selection probabilities inherent in our sampling design. Results: Our D1 model for the EQ-5D included ordinal terms to capture the effect of departures from perfect health as well as interaction effects. A random effects specification of the D1 model yielded a good fit for the observed TTO data, with an overall R 2 of 0.38, a mean absolute error of 0.025, and 7 prediction errors exceeding 0.05 in absolute magnitude. Conclusions: The D1 model best predicts the values for observed health states. The resulting preference weight estimates represent a From the *Tobacco Control Research Branch, Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland; the Institute of Health Economics and University of Alberta, Edmonton, Alberta, Canada; and the Center for Health Outcomes and PharmacoEconomic Research, College of Pharmacy, The University of Arizona, Tucson, Arizona. Supported by grant number 5 R01 HS10243 from the Agency for Healthcare Research and Quality. Dr. Johnson holds a Canada Research Chair in Diabetes Health Outcomes and is a Health Scholar with the Alberta Heritage Foundation for Medical Research. Presented at the 5th Annual Drug Information Association Workshop Pharmaceutical Outcomes Research, Tucson, Arizona, January 21-23, 2004. An earlier version of the paper was presented at the 20th Plenary Meeting of the EuroQol Group, Bled, Slovenia, September 11-14, 2003. Reprints: Stephen Joel Coons, PhD, College of Pharmacy, The University of Arizona, P.O. Box 210207, Tucson, AZ 85721-0207. E-mail: coons@pharmacy.arizona.edu. Copyright 2005 by Lippincott Williams & Wilkins ISSN: 0025-7079/05/4303-0203 significant enhancement of the EQ-5D s utility for health status assessment and economic analysis in the US. Key Words: EQ-5D, health status, preference weights, time trade-off (TTO) (Med Care 2005;43: 203 220) In 1993, the US Public Health Service convened a group of nongovernmental scientists and scholars to form the Panel on Cost-Effectiveness in Health and Medicine. 1 The Panel s charge was to assess the state-of-the-science in the field of cost-effectiveness analysis (CEA) and provide recommendations for enhancing the quality and comparability of healthrelated cost-effectiveness studies. Cost-utility analysis (CUA) is one form of CEA that compares the cost of health care programs or interventions to the outcomes, which are measured in terms of both quantity and quality of life (eg, quality-adjusted life years QALYs ). 2 In CUA, the Panel advocated using a system of generic health states and values to describe and measure health care outcomes. Several generic health status description and valuation systems are available; the most prominent are the Quality of Well-Being Scale, 3 5 the Health Utilities Index, 6 9 and the EuroQol Group s EQ-5D. 10 12 Each of these systems provides a classification of a respondent s health status and an empirically derived value, or preference, for that health state. The preference for that health state can then be combined with time to calculate an outcome such as QALYs gained as the unit of effectiveness. The prevailing preference-based scoring function for the EQ-5D was derived from the general population of the United Kingdom in the early 1990s. 13 15 A number of other countries have generated their own population-based preference weights for the instrument. 16 20 The EQ-5D is receiving increasing attention in the United States, particularly since it has been included as part of the Medical Expenditure Panel Survey conducted by the US Agency for Healthcare Research and Quality. 21 A set of US population-based preference weights has never been established for the EQ-5D. Until now, most studies using the EQ-5D in the United States have used Medical Care Volume 43, Number 3, March 2005 203
Shaw et al Medical Care Volume 43, Number 3, March 2005 the UK weights under the assumption that the preferences of the 2 countries populations differ minimally. As a result of these and other catalysts, we conducted this study to establish US population-based preference weights for the EQ-5D s 243 health states. Our primary objectives were to elicit values for 45 of the EQ-5D s health states through time trade-off (TTO) exercises conducted in a representative sample of the general adult US population and predict preference weights for all 243 health states conditional on the observed valuation data. METHODS Survey Instruments The data collection tools used in this research were adapted from those used in the Measurement and Valuation of Health (MVH) study in the United Kingdom. 13 The survey battery included a health state valuation questionnaire, background questionnaire, and self-completion booklets for the EQ-5D and Health Utilities Index Mark 2 and Mark 3 (HUI-2/3). 22 Spanish versions of the EQ-5D and HUI-2/3 were available from the instruments developers for use in the United States. The EQ-5D descriptive system consists of 5 dimensions: Mobility, Self-Care, Usual Activities, Pain/Discomfort, and Anxiety/Depression. Each dimension has 3 levels, reflecting no health problems, moderate health problems, and extreme health problems. A dimension for which there are no problems is said to be at level 1, whereas a dimension for which there are extreme problems is said to be at level 3. Each unique health state described by the instrument has an associated 5-digit descriptor ranging from 11111 for perfect health to 33333 for the worst possible state. The resulting descriptive system defines 243 (3 5 ) health states. In addition, unconscious and immediate death are included in the EQ-5D valuation process but are not a part of the descriptive system. Sampling Sample size calculations were based on the estimated number of respondents needed to perform comparisons among the major racial/ethnic groups in the United States. These indicated that 4000 completed interviews would be needed to detect a between-groups difference in mean TTO valuations of 0.07 with a power of 80% and probability of Type I error of 0.05. Observed differences between groups in previous studies suggested that a 7 10% difference in valuations was important. 15,23,24 The target population for the study comprised the roughly 210 million civilian noninstitutionalized English- and Spanish-speaking adults, aged 18 and older, who resided in the United States (50 states plus the District of Columbia) in 2002. A multistage probability sample was selected from the 204 target population using a sampling frame based on residential mailing lists and Census demographic data. The 2 largest minority groups in the United States, Hispanics and non- Hispanic blacks, were oversampled to ensure adequate numbers of minority respondents. Using a probabilities proportional to size systematic selection algorithm, 25 60 3-digit zip code tabulation areas (ZCTAs) 26 were selected from a sampling frame of 883 3-digit ZCTAs formed by collapsing 5-digit ZCTAs to their first 3 digits. Two 5-digit zip codes were then selected from each of the 60 ZCTAs. In a third stage, 12,000 addresses ( 100 per zip code) were selected for screening and interview. Residents of the 12,000 selected addresses were then located and screened. Seventy-eight eligible addresses not recorded on the mailing list used to select the study sample were identified using a half-open interval linking procedure 25,27 and added to the sample. Data Collection Structure of the Interview The data collection period began June 8, 2002, and continued through October 31, 2002. Data were collected by 109 field interviewers, including 30 bilingual interviewers. Each face-to-face interview consisted of 3 stages. All stages used a single set of 45 health states, with each health state described on a separate card. Only 15 health states/cards were used with each respondent. Interviews were administered in English or Spanish. The interview used a paper-and-pencil format, and respondents were paid $30 for their participation. Respondents were first asked to describe their own health at the time of the interview using the EQ-5D descriptive system. Then they were asked to place each of their assigned health states (as described below) in rank order from best to worst. Respondents were instructed that each state was to be experienced for 10 years followed by immediate death. The main objective of this stage was to familiarize respondents with the health state cards and the concept of forming preferences for the different health states. Respondents then rated their assigned health states on a visual analog scale (VAS). The VAS was bounded by 0 (ie, worst imaginable health state ) and 100 (ie, best imaginable health state ). Respondents were then asked to rate their own health using a similar VAS. In the next exercise, respondents were asked to value 13 of the 15 assigned health states using the TTO props method 13,28 This method uses a time (aka TTO) board and a set of health state cards. The time board is 2 sided one side for states considered better than death and the other for states considered worse than death and contains a sliding scale for length of time and transparent sleeves to hold the health state cards. Values for the health states 11111 and immediate death were anchors for the TTO valuation scale, with assigned values of 1 and 0, respectively. Respondents were guided through the exercise one state at a time using the TTO board 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States to show the varying lengths of time spent in the health states. They were first asked to decide whether a state was better or worse than death. A middle value (ie, 5 years) was then offered for full health, and the resulting answer determined whether the time spent in full health would increase (from 5 to 10 years) or decrease (from 5 to 0 years). After completion of the TTO exercise, respondents were asked a series of background/demographic questions and asked to self-complete the 15-item usual health HUI-2/3 and rate their health on a 5-point scale (ie, excellent to poor). Choice of Health States Previous work suggested that respondents could not be expected to value more than about 13 health states using the TTO in a single interview. 29 However, this was deemed an insufficient number of states from which to predict valuations for all 243 EQ-5D health states. It was decided that the same 45 health states valued in the MVH study would be valued in this study. To facilitate the comparison of statistical models for estimating preference weights, the sample was divided into a Modeling Sample (ie, sample in which all preliminary statistical modeling would be conducted) and a Validation Sample (ie, holdout sample intended for use in validating models developed in the Modeling Sample). Each respondent was randomly assigned to one of 5 groups. Four of the groups constituted our Modeling Sample, whereas the fifth group comprised our Validation Sample. Approximately 900 subjects were assigned to each of the Modeling Sample groups, 1 4 whereas 400 subjects were assigned to the Validation Sample (group 5). All 5 groups were assigned the states 11111, 33333, unconscious, and immediate death. We randomly allocated 2 of 5 very mild health states to each of the 4 groups in the Modeling Sample. The remaining 36 health states were then randomly distributed among the 4 groups (ie, 9 states per group). In addition to 11111, 33333, unconscious, and immediate death, a random selection of 11 of the remaining 41 health states valued by groups 1 4 was allocated to group 5. The specific group assignments are shown in Table 1. Derivation of the Analytical Sample We realized that some respondents would be excluded as the result of data problems. Further, we did not want to sacrifice the nearly 400 individuals who had been assigned to the Validation Sample. Hence, we planned to combine the Modeling Sample and Validation Sample to form a Valuation Sample in which the final set of preference weights would be estimated. A number of criteria were applied to exclude respondents with incomplete or inconsistent TTO data from the Valuation Sample. To be included, a respondent had to have valid TTO values for 12 of the 13 health states assigned (excluding unconscious). To ensure that we could derive sampling weights for all members of the Valuation Sample, respondents were excluded if they had incomplete demographic data (ie, age, sex, race/ethnicity). Similar to the MVH and other studies, 18 respondents were excluded from the Valuation Sample if all health states were given the same TTO value or if all health states were valued worse than death. A number of other criteria were applied to exclude TABLE 1. Group 1 Health State Group Assignments Modeling Sample Validation Sample Group 2 Group 3 Group 4 Group 5 11111 11111 11111 11111 11111 33333 33333 33333 33333 33333 Immediate death Immediate death Immediate death Immediate death Immediate death Unconscious Unconscious Unconscious Unconscious Unconscious 11121 12111 21111 11211 12111 11211 21111 11112 11112 11121 11131 11113 11122 22121 22121 21222 21133 12211 11133 11133 12121 21312 22112 11312 21222 13311 12222 32331 22122 22331 21323 32211 32313 13212 12222 12223 22331 22222 21232 13311 33232 13332 23321 23232 23321 32223 22323 22233 33323 33321 32232 33212 23313 33321 23313 2005 Lippincott Williams & Wilkins 205
Shaw et al Medical Care Volume 43, Number 3, March 2005 respondents from the Valuation Sample. These included the following: respondent valued one health state using both sides of the time board, respondent valued one or more incorrect health states based on his or her assigned card set, and respondent valued one or more health states more than once. A number of respondents mistakenly valued 11111 (ie, full health) or immediate death instead of unconscious during the TTO exercise. To the extent that valuations for these states made sense (ie, 11111 received a value of 1 and immediate death received a value of 0), these respondents were not excluded from the Valuation Sample. Respondents were excluded if the TTO value for 11111 or immediate death did not make logical sense in the context of the valuation exercise (eg, 11111 received a value of 0). Respondents were also excluded from the Valuation Sample if the label for a valued health state was missing and that state did not appear to be unconscious. The numbers of respondents in the Modeling, Validation, and Valuation samples who were identified as having 1 or more data problems are presented in Table 2. On the basis of the data completeness and logical consistency criteria we set forth, the final Valuation Sample included 3773 respondents. Two hundred seventy-five respondents were excluded from the Valuation Sample. TABLE 2. Numbers of Respondents With Data Problems Modeling Sample Validation Sample Valuation Sample Initial number of respondents included in sample 3650 398 4048 1. Used both sides of time board for same health state 2 0 2 2. All health states given same TTO value 24 3 27 3. All health states given negative TTO value 8 1 9 4. Two problems: 1 1 2 a) All (13) health states given the same TTO value b) All health states given negative TTO value 5. Valued 11111 or immediate death instead of unconscious TTO value made sense or was missing* (51) (6) (57) TTO value did not make sense 1 1 2 6. Missing labels for health states Missing label for 1 health state, appeared to be unconscious (1) 0 (1) Missing label for 1 or 2 health states, not including unconscious 3 0 3 7. Subject valued one or more incorrect states based on his or her assigned card set 1 incorrect state for group 1 0 1 8 incorrect states for group 8 0 8 8. Subject valued 1 or more health states more than once 1 health state 13 1 14 2 health states 3 0 3 9. Two problems: 1 1 2 a) Valued 1 incorrect health states for group b) Valued 1 health state twice 10. Two problems: 2 0 2 a) Valued 1 health state twice b) Missing TTO value for health state other than unconscious 11. Incomplete TTO data, ie, fewer than 12 complete TTO values 181 19 200 (not including unconscious) Number of respondents with one or more data problems 248 27 275 Number of respondents included in final sample 3402 371 3773 Note: Numbers in parentheses not subtracted from final sample totals. *For example, a value of 1 for 11111 or 0 for immediate death. For example, a value of 0 for 11111 or 1 for immediate death. 206 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States Statistical Analyses All statistical analyses took into account the clustered sampling design, and sampling weights were applied to adjust for respondents unequal selection probabilities. Analyses were performed using Stata/SE 8.0 30 and SAS Release 8.02. 31 Sample Weighting Separate sets of sampling weights were derived for the Valuation Sample and the sample consisting of the 4048 respondents with complete age, sex, and race/ethnicity data (ie, Full Sample). The weights were poststratified to September 2002 postcensal estimates for the adult (ie, age 18 years) civilian noninstitutional population using 6 strata formed by the interaction of sex (male and female) with race/ethnicity (Hispanic, non-hispanic black, and other). The other racial/ethnic category included non-hispanic individuals reporting a single race other than black or multiple races. The weights were further adjusted to account for sample members who were selected for interview but did not respond. Only the weighted estimates for sex and race/ethnicity (as defined in this study) are guaranteed to be consistent with September 2002 postcensal estimates for the adult civilian noninstitutional population. Sample Characteristics Descriptive statistics were used to characterize the Full Sample and various subsamples, including those respondents who were excluded from the Valuation Sample. Subgroups were compared on important demographic and health status measures using t tests for interval data and 2 tests for nominal data. Pearson 2 statistics were converted into F statistics with noninteger degrees of freedom using a secondorder Rao and Scott correction. 32,33 The difference in the number of self-reported chronic conditions between groups of respondents was estimated using negative binomial regression. The nonresponse adjustment and poststratification applied to the base sampling weights were attempts to account for completely missing data on individuals. However, item nonresponse also was a problem in our study, which could lead to biased estimates and questions about the relevance of the target population. 34 To correct for item nonresponse, the sampling weights were rescaled when conducting analyses using available cases. Specifically, each weight was multiplied by a factor that adjusted the sum of the weights within each of the 6 race-sex strata to the sum in the complete sample for which weights were available. Multiple imputation also was used to adjust for item nonresponse. The weighted sequential hot deck procedure 35 was used to impute vectors of missing data for individuals missing one or more item responses. The imputation was conducted separately within each primary sampling unit (PSU; ie, 3-digit ZTCA). When comparing members of the Valuation Sample with excluded respondents, donor observations were drawn from the 2889 individuals (representing 71.4% of the Full Sample) who provided complete data for the items being imputed. Of the remaining 1159 individuals who were missing data, 648 were missing only 1 item. When comparing the Modeling Sample and Validation Sample, donor observations were drawn from the 2705 members of the Valuation Sample (71.7%) who provided complete data for the items being imputed. Of the remaining 1068 individuals who were missing data, 605 were missing only 1 item. Before imputation the data were sorted by subgroup (eg, Valuation sample, excluded respondents), race/ethnicity (Hispanic, non-hispanic black, other), sex (male, female), and age (18 to 39, 40 to 64, 65 ) so that an individual with missing data would be matched to a complete respondent similar in these characteristics. 36 Ten complete data sets were produced and analyzed using methods analogous to those applied to the available cases (except that the original sampling weights were used instead of the rescaled weights). For categorical variables, this involved a multivariate test of the odds ratios following the logistic regression of sample membership on a set of dummies. The complete-data estimates were then combined to produce a single inference. 37 Model Development Numerous model specifications were investigated; however, only the most correctly specified models are presented in this report. Probability-weighted least squares was used to fit linear additive models to the respondent-level data. With-replacement sampling was assumed at the level of the PSUs. Taylor linearization was used to derive cluster-robust standard errors for the parameter estimates. 38 40 When fitting models, the sampling weights were rescaled within race-sex strata so that the estimated population size would equal that of the target population. Dependent Variable The TTO values for worse-than-death states are conventionally transformed so as to be bounded by 0 and 1, with 0 being equivalent to death. We chose to apply a linear transformation 6 to the values for states worse than death. The lowest possible health state value was 39, which occurred when 0.25 years in a given state followed by 9.75 years in full health was considered equal to death. The smallest amount of time that an individual could elect to spend in the state being valued was 0.25 years. Because the observed health state values ranged between 0 and 39, transformed values were obtained by dividing them by 39. The outcome consisted of 1 minus the (possibly transformed) value for a given health state. The values were subtracted from 1 to force the predicted value for 11111 to 2005 Lippincott Williams & Wilkins 207
Shaw et al Medical Care Volume 43, Number 3, March 2005 equal 1.0. Conceptually, our dependent variable was a measure of disutility, with higher numbers indicating greater disutility. Independent Variables All of our models included main effects derived from the EQ-5D descriptive system. The shifts between levels within a dimension were modeled using dummy variables: one measuring the difference between level 1 and level 2 and another measuring the difference between level 1 and level 3. Thus, 2 dummy variables each were generated for the Mobility (M2, M3), Self-Care (S2, S3), Usual Activities (U2, U3), Pain/Discomfort (P2, P3), and Anxiety/Depression (A2, A3) dimensions. We considered models including terms for all possible interactions among the main effects. However, such models suffered from multicollinearity and logically inconsistent parameter estimates (eg, the estimates for one or more level 1 dummy variables were negative in sign). Furthermore, the analysis of first-order interaction effects was problematic since the large number of possible effects engendered the risk that some were significant purely by chance. Thus, we investigated alternative methods of accounting for interactions. In previous valuation studies, the constant term has been interpreted as a measure of any movement away from perfect health (ie, a level 2 or 3 in any dimension). Including a constant term, however, yields a predicted value of 1.0 for full health and complicates estimation of the marginal effects for the dummy variables that represent the EQ-5D descriptive system. We therefore developed an ordinal variable, D1, that represented the number of movements away from perfect health (ie, the additional number of dimensions at level 2 or level 3) beyond the first. This variable was not a constant (ie, it took on values ranging from 0 to 4). Its use in place of a constant yielded a predicted value of 1.0 for full health, had no impact on the predicted values for other states, and allowed us to estimate directly the marginal effects of the main effect dummy variables. The identification of D1 led us to develop a conceptually based method of accounting for interaction effects. Any movement away from perfect health could be due to having one or more dimensions at level 2 or level 3. A single shift should be captured in the coefficient for a particular main effect dummy variable. However, 2 movements away from perfect health could be the result of having 2 dimensions at level 2, 2 dimensions at level 3, or 1 dimension at level 2 and another at level 3. One would expect an interaction effect to the extent that the marginal disutility was not equal to the sum of the individual effects. Thus, we created an ordinal variable, I3, that represented the number of dimensions at level 3 beyond the first. The square of this term was also generated to allow for nonlinearity in its association with the dependent 208 variable. Similar terms (ie, I2 and I2-squared) were produced to account for the additional number of dimensions at level 2. Functional Form The health state valuation data were not normally distributed. Specifically, they were negatively skewed and bimodal with peaks at 0 and 1. Conventional power/logarithmic transformations were not feasible given that the data were bimodal and included negative values. Generalized linear models using various link functions were investigated; however, these consistently yielded poor predictions. Ergo, we elected to focus our efforts on modeling the raw (ie, untransformed) data. The assumption of normality is primarily a convenience for the purpose of statistical inference. When this assumption fails to hold, the estimates of fixed and random parameters will still be consistent, though the standard error estimates will be inconsistent in small samples. 41 Alternatives to Pooled Least Squares In addition to pooled least squares, we considered both random and fixed effects specifications allowing for PSUlevel or respondent-level effects. It seemed intuitive that preferences would vary systematically among respondents or geographic areas. In models treating PSUs or respondents as fixed effects, the unit-specific effects were individually highly significant. This suggested that either a fixed effects or random effects model would be more appropriate than pooled least squares. A method was developed to estimate design-consistent random effects models. Weighted estimates of the variances of the error components were obtained and used to apply a generalized least squares transformation to the data. Probability-weighted least squares was then applied to estimate the model from the transformed data. The parameter estimates and standard errors from our random effects models were identical (to the fourth decimal place) to those derived when estimating analogous covariance pattern models using SUDAAN. 42 Specification Testing Each model was fit to the Modeling Sample data and the resulting parameter estimates applied to the Validation Sample data to generate a number of goodness-of-fit statistics. The models were then compared based on the magnitude of these statistics. The same fit statistics were computed to assess the apparent or native performance of each model in the Valuation Sample. All of the statistics used in comparing models were probability weighted. These statistics included the following: square of the Pearson product-moment correlation between the observed and predicted health state values for the Modeling (Valuation) Sample, ie, R 2 overall; 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States square of the correlation between the mean observed and predicted values for the Validation (Valuation) Sample, ie, between-states R 2 or R 2 between; mean absolute error for predicting the 12 (42) health states valued by the Validation (Valuation) Sample, ie, MAE; and number of prediction errors greater than 0.05 or 0.10 in absolute magnitude. We assessed the normality of the residuals and predicted random effects via graphical means. Belsley s condition index 43 was computed using the weighted data to evaluate whether multicollinearity was present. A condition index in excess of 30 is commonly held to indicate significant multicollinearity. 43 Ramsey s regression specification error test (RESET test) 44 was used to test for omitted variables and/or incorrect functional form. A robust version of the Hausman test 45 suggested by Wooldridge 46 was performed to evaluate the consistency of the random effects models. Though all of the regressors varied within respondents, we were able to test a maximum of 4 coefficients simultaneously. This was because the variation between respondents was entirely explained by their assignment to the 5 health state card sets. Finally, the predictions for all 243 health states were required to be logically consistent. That is, the predicted value for one health state had to be greater than or equal to the predicted value for another health state if the former was better than the latter on at least one dimension and was no worse on any other dimension. Each of the model specifications considered in this article yielded logically consistent predictions in the Valuation Sample. RESULTS Response Rates We determined the eligibility status for an unweighted 76.1% of the address sample (ie, we successfully screened 9196 eligible addresses and 2882 ineligible addresses). The weighted screening success rate was 74.8%, reflecting the fact that minorities were overrepresented in the sample. From the list of eligible addresses, 5237 persons were selected for interview. Completed interviews were obtained from 4048 participants, yielding an unweighted interview response rate among screened addresses of 77.3%. The weighted interview response rate among screened addresses was slightly lower (75.0%) because of lower participation among non-hispanic nonblacks. The unweighted interview response rate among all sampled addresses, which accounts for both the screening rate and the interview participation rate, was 59.4%. The weighted equivalent was 56.3%. Sample Characteristics There were few differences between the Valuation Sample and excluded respondents (Table 3). Qualitatively, the excluded respondents appeared to be more likely to report having no chronic conditions and less likely to report having 2 chronic conditions than those included in the Valuation Sample. The number of chronic conditions ranged from 0 to 17 in the Valuation Sample and from 0 to 11 among the excluded subjects; however, the difference was not statistically significant. Zero-inflated negative binomial regression models estimated using available cases (P 0.157) or imputed data (P 0.179) yielded similar inferences. The most notable difference between the 2 groups of respondents was in terms of self-reported health problems as measured by the EQ-5D. Specifically, excluded respondents were more likely to report having problems with mobility, self-care, and usual activities than those included in the Valuation Sample. Those excluded from the Valuation Sample also took longer to complete the interview than those included in the sample, with weighted means of 71.4 minutes and 65.9 minutes, respectively (P 0.027). There was but a single difference between the Modeling Sample and Validation Sample (Table 3). Specifically, a weighted 4.0% of the Modeling Sample reported having problems with self-care as measured by the EQ-5D compared with 1.4% of the Validation Sample (P 0.01). Modeling Analyses Table 4 presents Modeling Sample estimates and fit statistics for various model specifications. Fixed effects specifications are not presented in the table since estimates for the random effects models (including I2, I3, and their squares) were deemed to be consistent in the Valuation Sample. The Hausman test failed to reject the null hypothesis that the unit-level effects were uncorrelated with the regressors in the model treating PSUs as random effects (F 4, 56 1.34; P 0.27) as well as the model treating respondents as random effects (F 4, 56 1.48; P 0.22). Estimates for D1 were consistently negative, which suggested a declining marginal disutility associated with additional shifts away from perfect health beyond the first. Estimates for I2 were consistently nonsignificant. However, estimates for the square of this term were significant, indicating that the effect of having multiple dimensions at level 2 was nonlinear. The I3 and I3-squared terms had a much greater impact on variance explained and prediction accuracy than did the terms accounting for level 2 interactions. The fit statistics for the random effects models were comparable to those for the pooled least squares specification. The specifications that included I2 suffered from multicollinearity and inefficiency because of irrelevant variables. Thus, a decision was made to exclude the I2 term. From a conceptual standpoint, the inclusion of I2 was not necessary. Although we hypothesized that the disutility associated with movements away from perfect health would be a function of 2005 Lippincott Williams & Wilkins 209
Shaw et al Medical Care Volume 43, Number 3, March 2005 TABLE 3. Characteristic Descriptive Statistics for Full Sample and Various Subsamples Full Sample (n 4048) Valuation Sample (n 3773) Excluded Persons (n 275) P Value* AC MI Age, mean/se (n) 44.67/0.46 (4048) 44.51/0.50 (3773) 47.00/1.51 (275) 0.127 NA Sample range 18.0 99.3 18.0 99.3 18.2 90.9 Sex, % (n) Male 48.00 (1694) 48.41 (1594) 41.99 (100) 0.051 NA Female 52.00 (2354) 51.59 (2179) 58.01 (175) Race/ethnicity, % (n) 0.508 NA Hispanic 11.86 (1216) 11.70 (1115) 14.23 (101) Non-Hispanic black 11.02 (1123) 11.10 (1055) 9.81 (68) Other 77.12 (1709) 77.20 (1603) 75.96 (106) Language version, % (n) 0.201 NA English 94.74 (3480) 94.87 (3262) 92.80 (218) Spanish 5.26 (568) 5.13 (511) 7.20 (57) Years of education, % (n) 0.375 0.251 8 or less 6.12 (422) 5.94 (378) 8.85 (44) 9 11 12.47 (616) 12.59 (581) 10.71 (35) 12 34.12 (1283) 34.04 (1194) 35.29 (89) 13 15 26.79 (1061) 27.12 (999) 21.74 (62) 16 or more 20.50 (637) 20.31 (599) 23.41 (38) Income, % (n) 0.367 0.074 $5000 or less 4.07 (280) 3.86 (254) 7.31 (26) $5000 $10,000 7.45 (399) 7.54 (378) 5.92 (21) $10,000 $20,000 15.55 (695) 15.54 (641) 15.85 (54) $20,000 $40,000 28.85 (992) 28.84 (923) 28.91 (69) $40,000 $75,000 28.22 (750) 28.56 (712) 22.80 (38) $75,000 or more 15.86 (329) 15.65 (308) 19.22 (21) Marital status, % (n) 0.655 0.591 Married 53.25 (1757) 52.97 (1637) 57.48 (120) Living with partner 7.55 (310) 7.67 (295) 5.72 (15) Widowed 7.22 (308) 7.28 (287) 6.24 (21) Divorced 10.80 (514) 10.82 (478) 10.59 (36) Separated 2.52 (204) 2.45 (186) 3.60 (18) Never married 18.66 (920) 18.81 (861) 16.37 (59) Chronic conditions, % (n) 0.967 0.432 0 32.34 (1352) 31.89 (1267) 39.13 (85) 1 22.99 (817) 23.04 (762) 22.27 (55) 2 16.35 (564) 16.92 (537) 7.78 (27) 3 11.67 (361) 11.68 (333) 11.55 (28) 4 or more 16.65 (547) 16.47 (502) 19.27 (45) EQ-5D dimensions, % (n) Mobility 0.006 0.005 Problems 18.87 (723) 18.25 (656) 27.73 (67) No problems 81.13 (3288) 81.75 (3083) 72.27 (205) Self-care 0.010 0.011 Problems 4.07 (178) 3.73 (157) 8.99 (21) No problems 95.93 (3836) 96.27 (3588) 91.01 (248) Usual activities 0.039 0.042 Problems 15.43 (612) 14.97 (554) 22.12 (58) No problems 84.57 (3397) 85.03 (3182) 77.88 (215) Pain/discomfort 0.804 0.840 Problems 40.86 (1559) 40.78 (1452) 41.99 (107) No problems 59.14 (2458) 59.22 (2292) 58.01 (166) Anxiety/depression 0.705 0.671 Problems 26.29 (1054) 26.39 (986) 24.85 (68) No problems 73.71 (2966) 73.61 (2762) 75.15 (204) (Continued) 210 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States TABLE 3. (Continued) Characteristic Full Sample (n 4048) Valuation Sample (n 3773) Excluded Persons (n 275) P Value* AC MI Health status, % (n) 0.218 0.063 Excellent 26.97 (1032) 26.89 (968) 28.15 (64) Very good 42.46 (1594) 42.64 (1500) 39.66 (94) Good 20.69 (898) 20.67 (828) 20.89 (70) Fair 8.26 (399) 8.32 (370) 7.27 (29) Poor 1.63 (76) 1.47 (65) 4.03 (11) EQ-5D VAS, mean/se (n) 84.19/0.39 (3995) 84.32/0.39 (3728) 82.39/1.76 (267) 0.287 0.287 Sample range 0 100 0 100 10 100 EQ-5D index, mean/se (n) 0.79/0.006 (3977) 0.80/0.006 (3709) 0.77/0.022 (268) 0.348 0.366 Sample range 0.59 0.92 0.59 0.92 0.59 0.92 HUI-2 utility score, mean/se (n) 0.86/0.005 (3889) 0.86/0.005 (3635) 0.87/0.012 (254) 0.605 0.399 Sample range 0.05 1.00 0.05 1.00 0.06 1.00 HUI-3 utility score, mean/se (n) 0.81/0.006 (3907) 0.81/0.007 (3647) 0.81/0.020 (260) 0.741 0.916 Sample range 0.34 1.00 0.34 1.00 0.19 1.00 Characteristic Valuation Sample (n 3773) Modeling Sample (n 3402) Validation Sample (n 371) P Value* AC MI Age, mean/se (n) 44.51/0.50 (3773) 44.62/0.51 (3402) 43.42/1.30 (371) 0.367 NA Sample range 18.0 99.3 18.0 99.3 18.3 89.0 Sex, % (n) Male 48.00 (1594) 48.30 (1446) 45.21 (148) 0.376 NA Female 52.00 (2179) 51.70 (1956) 54.79 (223) Race/ethnicity, % (n) 0.925 NA Hispanic 11.86 (1115) 11.84 (998) 12.13 (117) Non-Hispanic black 11.02 (1055) 10.97 (951) 11.48 (104) Other 77.12 (1603) 77.20 (1453) 76.39 (150) Language version, % (n) 0.861 NA English 94.80 (3262) 94.81 (2946) 94.66 (316) Spanish 5.20 (511) 5.19 (456) 5.34 (55) Years of education, % (n) 0.884 0.915 8 or less 5.98 (378) 5.94 (334) 6.38 (44) 9 11 12.60 (581) 12.55 (512) 13.11 (69) 12 34.03 (1194) 34.12 (1096) 33.22 (98) 13 15 27.10 (999) 27.31 (904) 25.21 (95) 16 or more 20.28 (599) 20.08 (535) 22.08 (64) Income, % (n) 0.101 0.330 $5000 or less 3.89 (254) 3.80 (221) 4.81 (33) $5000 $10,000 7.58 (378) 7.41 (336) 9.20 (42) $10,000 $20,000 15.56 (641) 15.26 (577) 18.46 (64) $20,000 $40,000 28.85 (923) 29.09 (847) 26.53 (76) $40,000 $75,000 28.52 (712) 29.27 (649) 21.36 (63) $75,000 or more 15.60 (308) 15.18 (276) 19.64 (32) Marital status, % (n) 0.398 0.307 Married 52.93 (1637) 53.36 (1475) 48.93 (162) Living with partner 7.68 (295) 7.75 (265) 7.00 (30) Widowed 7.32 (287) 7.11 (253) 9.28 (34) Divorced 10.83 (478) 11.00 (444) 9.24 (34) Separated 2.46 (186) 2.39 (164) 3.12 (22) Never married 18.78 (861) 18.39 (778) 22.41 (83) (Continued) 2005 Lippincott Williams & Wilkins 211
Shaw et al Medical Care Volume 43, Number 3, March 2005 TABLE 3. (Continued) Characteristic Valuation Sample (n 3773) Modeling Sample (n 3402) Validation Sample (n 371) P Value* AC MI Chronic conditions, % (n) 0.488 0.618 0 31.87 (1267) 32.22 (1148) 28.78 (119) 1 23.03 (762) 22.59 (682) 26.94 (80) 2 16.92 (537) 16.80 (485) 18.00 (52) 3 11.68 (333) 11.80 (300) 10.61 (33) 4 or more 16.49 (502) 16.59 (447) 15.67 (55) EQ-5D dimensions, % (n) Mobility 0.889 0.884 Problems 18.24 (656) 18.19 (588) 18.72 (68) No problems 81.76 (3083) 81.81 (2782) 81.28 (301) Self-care 0.011 0.010 Problems 3.73 (157) 3.99 (146) 1.36 (11) No problems 96.27 (3588) 96.01 (3230) 98.64 (358) Usual activities 0.292 0.261 Problems 14.97 (554) 15.26 (504) 12.30 (50) No problems 85.03 (3182) 84.74 (2864) 87.70 (318) Pain/discomfort 0.358 0.312 Problems 40.77 (1452) 41.19 (1320) 36.87 (132) No problems 59.23 (2292) 58.81 (2055) 63.13 (237) Anxiety/depression 0.704 0.698 Problems 26.42 (986) 26.30 (892) 27.58 (94) No problems 73.58 (2762) 73.70 (2486) 72.42 (276) Health status, % (n) 0.332 0.251 Excellent 26.88 (968) 26.35 (868) 31.88 (100) Very good 42.63 (1500) 43.13 (1371) 37.88 (129) Good 20.68 (828) 20.57 (740) 21.74 (88) Fair 8.33 (370) 8.48 (330) 6.91 (40) Poor 1.47 (65) 1.46 (57) 1.58 (8) EQ-5D VAS, mean/se (n) 84.32/0.39 (3728) 84.25/0.43 (3360) 84.92/1.05 (368) 0.571 0.549 Sample range 0 100 0 100 10 100 EQ-5D index, mean/se (n) 0.80/0.006 (3709) 0.79/0.007 (3341) 0.81/0.012 (368) 0.442 0.349 Sample range 0.59 0.92 0.59 0.92 0.24 0.92 HUI-2 utility score, mean/se (n) 0.86/0.005 (3635) 0.86/0.005 (3277) 0.86/0.011 (358) 0.735 0.788 Sample range 0.05 1.00 0.05 1.00 0.18 1.00 HUI-3 utility score, mean/se (n) 0.81/0.007 (3647) 0.81/0.007 (3288) 0.83/0.017 (359) 0.302 0.301 Sample range 0.34 1.00 0.34 1.00 0.12 1.00 Notes: Percentages are given in terms of the US adult civilian noninstitutional population (210,507,186 persons in September 2002). The numbers of observations given in parentheses represent the number of responses within the sample. The Valuation Sample estimates are based on sampling weights for the Valuation Sample. *For comparison between Valuation Sample and excluded respondents. Calculated as the sum of self-reported chronic conditions for respondents without missing data. For comparison between samples, the number of chronic conditions was not censored at 4. Societal value for own health state calculated using UK scoring system. 15 AC indicates available cases; MI, multiple imputation. the number of dimensions at level 2, we had no a priori knowledge of the correct functional form for this relationship. Given its conceptual plausibility as well as the statistical 212 evidence in its favor, we selected the specification that excluded I2 and allowed for a respondent-level random effect as the valuation model for the US population. 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States TABLE 4. Parameter Estimates and Fit Statistics for Pooled Least Squares and Alternative Model Specifications Variable Pooled Least Squares PSU-Level Random Effects Respondent-Level Random Effects (Including I2) Respondent-Level Random Effects (Excluding I2) M2 0.154 (0.011) 0.152 (0.011) 0.144 (0.009) 0.144 (0.009) M3 0.551 (0.016) 0.549 (0.015) 0.564 (0.016) 0.564 (0.016) S2 0.191 (0.011) 0.187 (0.011) 0.178 (0.008) 0.177 (0.007) S3 0.474 (0.015) 0.472 (0.015) 0.481 (0.016) 0.481 (0.016) U2 0.135 (0.009) 0.134 (0.008) 0.138 (0.009) 0.138 (0.009) U3 0.367 (0.013) 0.365 (0.012) 0.377 (0.014) 0.378 (0.013) P2 0.165 (0.007) 0.164 (0.008) 0.177 (0.008) 0.177 (0.008) P3 0.527 (0.020) 0.525 (0.020) 0.545 (0.021) 0.546 (0.021) A2 0.161 (0.010) 0.159 (0.009) 0.156 (0.008) 0.156 (0.009) A3 0.449 (0.015) 0.446 (0.015) 0.455 (0.016) 0.456 (0.015) D1 0.137 (0.011) 0.135 (0.011) 0.142 (0.012) 0.143 (0.010) I2 0.014 (0.014)* 0.014 (0.013)* 0.002 (0.012)* I2-squared 0.013 (0.003) 0.013 (0.003) 0.011 (0.002) 0.011 (0.002) I3 0.120 (0.023) 0.119 (0.023) 0.130 (0.019) 0.129 (0.018) I3-squared 0.014 (0.003) 0.014 (0.003) 0.014 (0.003) 0.014 (0.003) Modeling Sample R 2 overall 0.382 0.382 0.382 0.382 Validation Sample R 2 between 0.984 0.984 0.982 0.982 MAE 0.035 0.036 0.035 0.035 No. (of 12) 0.05 3 3 4 4 No. (of 12) 0.10 0 0 0 0 Valuation Sample R 2 overall 0.380 0.380 0.380 0.380 R 2 between 0.986 0.986 0.986 0.986 MAE 0.024 0.024 0.025 0.025 No. (of 42) 0.05 4 4 7 7 No. (of 42) 0.10 0 0 0 0 Notes: Parameter estimates shown were derived using Modeling Sample data. Fit statistics for Validation Sample were derived by applying Modeling Sample estimates to Validation Sample data. Fit statistics for Valuation Sample were derived by applying Valuation Sample estimates to Valuation Sample data. Standard errors are given in parentheses. *P 0.05; otherwise P 0.001. Final Valuation Model Valuation Sample estimates for the chosen D1 specification are presented in Table 5. All of the parameter estimates were statistically significant, P 0.001. The model did not suffer from multicollinearity (condition index of 26.59), and its estimates were deemed to be consistent by the Hausman test (F 4, 56 1.49; P 0.22). Although the residuals for this model were approximately normally distributed, the predicted respondent-level effects were positively skewed (graphs not shown). The weighted means of both error components were statistically no different from zero. Consistent with the results of valuation studies conducted in other countries, 15,19 the RESET test suggested the presence of misspecification due to omitted variables or incorrect functional form (F 3, 57 28.58; P 0.001). Table 6 presents the weighted mean observed TTO values, fitted values, and residuals for the D1 model following estimation in the Valuation Sample. For only 7 health states did the residual error exceed 0.05 in absolute magnitude. Also, there was no discernible autocorrelation in the prediction errors of the model, that is, there did not appear to be any association between the magnitude of the health state values and the sign of the prediction errors. The full set of US population-based preference weights for the 243 EQ-5D health states and a description of the scoring system are provided in Appendices 1 and 2, respectively. 2005 Lippincott Williams & Wilkins 213
Shaw et al Medical Care Volume 43, Number 3, March 2005 TABLE 5. US Population Estimates for D1 Valuation Model Variable Coeff. SE P Value LL 95% CI UL M2 0.146 0.008 0.001 0.129 0.163 M3 0.558 0.016 0.001 0.526 0.589 S2 0.175 0.008 0.001 0.160 0.190 S3 0.471 0.016 0.001 0.440 0.503 U2 0.140 0.008 0.001 0.123 0.157 U3 0.374 0.013 0.001 0.347 0.401 P2 0.173 0.008 0.001 0.157 0.188 P3 0.537 0.020 0.001 0.497 0.577 A2 0.156 0.008 0.001 0.139 0.173 A3 0.450 0.015 0.001 0.421 0.480 D1 0.140 0.010 0.001 0.159 0.120 I2-squared 0.011 0.002 0.001 0.007 0.014 I3 0.122 0.018 0.001 0.157 0.086 I3-squared 0.015 0.003 0.001 0.020 0.010 DISCUSSION Using a large, representative sample of the general adult US population, we generated a preference-weighting system for the EQ-5D health states. This scoring function can be used to generate preference weights that may be incorporated into QALYs and CUAs. As recommended by the Panel on Cost-Effectiveness in Health and Medicine, 1 US population-based weights should be used to inform resource allocation decisions made in the United States. We feel confident that the D1 model is the most appropriate for the US population. This model provided the best fit for the valuation data with few prediction errors, which allowed us to meet our objective of developing a preference-weighting system for the United States. The exclusion of irrelevant predictors and application of the random effects specification also made this model efficient relative to others tested. This will facilitate comparisons of valuations among the major racial/ethnic groups in the United States as well as the comparison of our TTO data with TTO data collected in the United Kingdom. In previous valuation studies, a nonlinear transformation 47 was applied to the values for states worse than death. It is notable that the D1 model yielded logically consistent predictions when this transformation was applied to the US data. The predictions were also logically consistent when the model was estimated using data from the MVH study sample (n 2997). The N3 model, which was used in the MVH investigation, 15 was evaluated and provided a worse fit for our Valuation Sample data than the D1 model. Applying the linear transformation to states worse than death, the pooled least squares specification of the N3 model yielded an MAE 214 of 0.032 with 8 prediction errors 0.05 in absolute magnitude. The nonlinear transformation 47 applied in earlier valuation studies can yield transpositions in the observed preference order for worse-than-death health states. We elected to use a linear transformation due to its greater consistency with expected utility theory. Because we chose to use a different transformation, our results are not strictly comparable with those of earlier investigations. Our decision will have no impact on studies using the EQ-5D that are conducted within the United States. In multinational studies, however, sensitivity analyses will need to be performed to evaluate the impact of differences between countries in the methodology used to generate preference weights for the EQ-5D health states. I2 and I3 were ordinal variables measuring the change in disutility associated with an increase in the number of dimensions at a particular level. We circumvented the assumption of a constant rate of change by including quadratic terms in the model. The use of dummy variables instead of ordinal variables would have been more appropriate but was not feasible since doing so yielded predictions that were logically inconsistent (eg, the predicted value for 33233 was lower than that for 33333). We did not include terms for interactions between dimensions at level 2 and level 3. A number of different methods were investigated for modeling these effects; however, all of them yielded logically inconsistent predictions. In most ways, the individuals included in the Valuation Sample were similar to those who were excluded from the sample. There were no differences between the 2 groups of 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States TABLE 6. Observed and Predicted Values for 42 Health States Observed Predicted Health State n Mean SE Mean SE Mean Error 11121 1227 0.880 0.008 0.827 0.008 0.053 21111 1685 0.870 0.011 0.854 0.008 0.016 11211 1717 0.867 0.008 0.860 0.008 0.007 12111 1223 0.842 0.010 0.825 0.008 0.017 11112 1694 0.832 0.011 0.844 0.008 0.012 12211 833 0.790 0.021 0.814 0.008 0.024 12121 856 0.789 0.015 0.781 0.007 0.008 11122 833 0.762 0.018 0.800 0.009 0.038 22121 1232 0.742 0.014 0.742 0.008 0.000 22112 833 0.703 0.028 0.759 0.010 0.056 22122 861 0.685 0.018 0.672 0.012 0.013 21222 1227 0.678 0.018 0.708 0.013 0.030 12222 1223 0.661 0.018 0.678 0.012 0.017 11312 861 0.646 0.015 0.609 0.011 0.037 21312 852 0.630 0.017 0.592 0.011 0.038 22222 833 0.596 0.027 0.597 0.019 0.001 11113 852 0.557 0.023 0.550 0.015 0.007 13212 861 0.513 0.021 0.501 0.013 0.012 13311 1227 0.477 0.018 0.431 0.016 0.046 12223 856 0.465 0.020 0.438 0.015 0.027 21232 861 0.413 0.029 0.397 0.018 0.016 21323 856 0.394 0.018 0.401 0.014 0.007 11131 856 0.390 0.027 0.463 0.020 0.073 23321 1204 0.376 0.021 0.380 0.014 0.004 22323 852 0.359 0.023 0.333 0.015 0.026 32211 852 0.329 0.021 0.396 0.013 0.067 22331 1223 0.295 0.020 0.312 0.016 0.017 11133 1232 0.294 0.026 0.289 0.022 0.005 21133 852 0.283 0.026 0.282 0.020 0.001 23313 1204 0.220 0.024 0.279 0.017 0.059 23232 861 0.217 0.024 0.202 0.020 0.015 33212 852 0.201 0.028 0.220 0.020 0.019 22233 833 0.201 0.027 0.204 0.018 0.003 32223 856 0.197 0.022 0.156 0.017 0.041 32232 856 0.147 0.021 0.086 0.020 0.061 13332 852 0.140 0.024 0.182 0.017 0.042 33321 1232 0.139 0.017 0.145 0.016 0.006 32313 833 0.134 0.023 0.164 0.017 0.030 33232 856 0.055 0.018 0.012 0.021 0.067 32331 833 0.050 0.025 0.077 0.017 0.027 33323 861 0.015 0.024 0.030 0.016 0.015 33333 3773 0.103 0.012 0.109 0.012 0.006 MAE 0.025 Note: Data are rank ordered by observed mean TTO values. 2005 Lippincott Williams & Wilkins 215
Shaw et al Medical Care Volume 43, Number 3, March 2005 respondents with respect to any of the measured demographic characteristics. The 2 groups differed in terms of perceived functional status as measured using the EQ-5D descriptive system. However, they did not differ with respect to ratings of overall health status or measures of the perceived value of health (including the EQ-5D index derived using the UK preference-weighting system). Furthermore, the differences in perceived functional status cannot readily be ascribed to age or self-reported chronic conditions. The excluded respondents appeared to have greater difficulty with the TTO exercise than those included in the Valuation Sample (as evidenced by the problems with their TTO data as well as the amount of time it took for them to complete the interview). Due to the limited availability of goodness-of-fit or effect size measures for discrete-choice survey estimators, we are unable to gauge the importance of differences in perceived functional status between the groups. The difference between the Modeling Sample and Validation Sample in terms of problems with self-care on the EQ-5D caused some concern. No differences were observed on other health status or sociodemographic measures. Further, given the number of statistical tests used to compare the 2 groups of respondents, one might have expected at least one difference due to chance alone. When added to the D1 model, a dummy variable for membership in the Validation Sample was not significant (P 0.369). Similarly, dummy variables indicating whether or not a respondent reported having problems in any of the EQ-5D dimensions were neither individually nor jointly statistically significant. In conclusion, the conceptually based D1 model yielded the best fit for the observed TTO data, with an MAE of 0.025 and only 7 prediction errors exceeding 0.05 in absolute magnitude. Among those specifications that were considered, and based on our assumptions (ie, a linear additive model with the values for states worse than death being linearly transformed), this appeared to be the most correctly specified model. The resulting preference weight estimates provide a significant enhancement of the EQ-5D s utility for health status assessment and economic analysis in the United States. ACKNOWLEDGMENTS The authors thank Kathleen Considine, Allen Duffer, and Vincent Iannacchione, all from RTI International, for their invaluable assistance in developing a sampling plan and collecting the data. In addition, the contributions of the project s Scientific Advisory Board (Dennis G. Fryback Chair, Marthe R. Gold, Robert M. Kaplan, Doris C. Lefkowitz, Joseph Lipscomb, Joanna E. Siegel), consultants (Ron D. Hays, Paul Kind, David Feeny), and AHRQ project officer (Yen-pin Chiang) are greatly appreciated. 216 REFERENCES 1. Gold MR, Siegel JE, Russell LB, et al, eds. Cost-Effectiveness in Health and Medicine. New York: Oxford University Press; 1996. 2. Coons SJ, Kaplan RM. Cost-utility analysis. In: Bootman JL, Townsend RJ, McGhan WF, eds. Principles of Pharmacoeconomics. 3rd ed. Cincinnati, OH: Harvey Whitney Books Company; 2005:117 147. 3. Patrick DL, Bush JW, Chen MM. Methods for measuring levels of well-being for a health status index. Health Serv Res. 1973;8:228 245. 4. Kaplan RM, Anderson JP. The general health policy model: update and applications. Health Serv Res. 1988;23:203 235. 5. Kaplan RM, Anderson JP. The general health policy model: an integrated approach. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd ed. Philadelphia, PA: Lippincott-Raven Press; 1996:309 322. 6. Torrance GW, Boyle MH, Horwood SP. Application of multi-attribute utility theory to measure social preferences for health states. Oper Res. 1982;30:1043 1069. 7. Torrance GW, Feeny DH, Furlong WJ, et al. Multiattribute utility function for a comprehensive health status classification system: Health Utilities Index Mark 2. Med Care. 1996;34:702 722. 8. Feeny DH, Torrance GW, Furlong WJ. Health Utilities Index. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd ed. Philadelphia, PA: Lippincott-Raven Press; 1996:239 252. 9. Feeny DH, Furlong WJ, Torrance GW, et al. Multiattribute and singleattribute utility functions for the Health Utilities Index Mark 3 system. Med Care. 2002;40:113 128. 10. EuroQol Group. EuroQol: a new facility for the measurement of healthrelated quality of life. Health Policy. 1990;16:199 208. 11. Brooks RG. EuroQol: the current state of play. Health Policy. 1996;37: 53 72. 12. Kind P. The EuroQol instrument: an index of health-related quality of life. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd ed. Philadelphia, PA: Lippincott-Raven Press; 1996: 191 201. 13. Dolan P, Gudex C, Kind P, Williams A. A social tariff for EuroQol: results from a U.K. general population survey. Discussion paper #138. York, England: Centre for Health Economics, The University of York; 1995. 14. Williams A. The measurement and valuation of health: a chronicle. Discussion paper #136. York, England: Centre for Health Economics, The University of York; 1995. 15. Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997;35:1095 1108. 16. Claes C, Greiner W, Uber A, Graf von der Schulenburg J-M. An interview-based comparison of the TTO and VAS values given to EuroQol states of health by the general German population. In: Greiner W, Graf von der Schulenburg J-M, Piercy J. Proceedings of the 15th Plenary Meeting of the EuroQol Group; 1998 October 1 2; Hannover, Germany. Hannover: Centre for Health Economics and Health Systems Research, University of Hannover; 1999:13 38. 17. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Med Decis Making. 2001;21:7 16. 18. Tsuchiya A, Ikeda S, Ikegami N, et al. Estimating an EQ-5D population value set: the case of Japan. Health Econ. 2002;11:341 353. 19. Wittrup-Jensen KU, Lauridsen JT, Gudex C, et al. Estimating Danish EQ-5D tariffs using the time trade-off (TTO) and visual analogue scale (VAS) methods. In: Norinder AL, Pedersen KM, Roos P, editors. Proceedings of the 18th Plenary Meeting of the EuroQol Group; 2001 September 6 7; Copenhagen, Denmark. Lund: Swedish Institute for Health Economics; 2002:257 292. 20. Jelsma J, Hansen K, de Weerdt W, et al. How do Zimbabweans value health states? Popul Health Metr. 2003;1:1 11. 21. Cohen SB. Design strategies and innovations in the Medical Expenditure Panel Survey. Med Care. 2003;41(7 Suppl):III5 III12. 22. Furlong WJ, Feeny DH, Torrance GW, Barr RD. The Health Utilities Index (HUI) system for assessing health-related quality of life in clinical trials. Ann Med. 2001;33:375 384. 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States 23. Johnson JA, Coons SJ. Comparison of the EuroQol and the SF-12 in an adult US sample. Qual Life Res. 1998;7:155 166. 24. Johnson JA, Pickard AS. Comparison of the EQ-5D and SF-12 in a general population survey in Alberta, Canada. Med Care. 2000;38:115 121. 25. Kish L. Survey Sampling. New York: John Wiley & Sons; 1965. 26. U.S. Census Bureau. Census 2000 ZCTAs TM ZIP Code Tabulation Areas Technical Documentation. Washington, DC: Geography Division, U.S. Census Bureau; 2000. 27. Iannacchione VG, Staab JM, Redden DT. Evaluating the use of residential mailing addresses in a metropolitan household survey. Public Opin Q. 2003;67:202 210. 28. Gudex C, ed. Time Trade-Off User Manual: Props and Self-Completion Methods. York, England: Centre for Health Economics, The University of York; 1994. 29. Dolan P, Gudex C, Kind P, et al. The time trade-off method: results from a general population study. Health Econ. 1996;5:141 154. 30. Stata Corporation. Stata/SE for Windows software program. Release 8.0. College Station (TX): Stata Corporation; 2003. 31. SAS Institute, Inc. Statistical Analysis Software software program. Version 8.02. Cary (NC): SAS Institute, Inc.; 2001. 32. Rao JNK, Scott AJ. The analysis of categorical data from complex sample surveys: chi-squared tests for goodness of fit and independence in two-way tables. J Am Stat Assoc. 1981;76:221 230. 33. Rao JNK, Scott AJ. On chi-squared tests for multiway contingency tables with cell proportions estimated from survey data. Ann Stat. 1984;12:46 60. 34. Korn EL, Graubard BI. Analysis of Health Surveys. New York: John Wiley & Sons; 1999. 35. Cox BG. The weighted sequential hot deck imputation procedure. Proceedings of the Survey Research Methods Section of the American Statistical Association. Alexandria, VA: American Statistical Association; 1980:721 726. 36. Williams RL, Chromy JR. SAS sample selection macros. Proceedings of the Fifth Annual SAS User s Group International Conference. Cary, NC: SAS Institute, Inc.; 1980:392 396. 37. Little RJA, Rubin DB. Statistical Analysis With Missing Data. 2nd ed. Hoboken, NJ: John Wiley & Sons; 2002:85 87. 38. Kish L, Frankel MR. Inference from complex samples. J R Stat Soc Ser B. 1974;36:1 37. 39. Fuller WA. Regression analysis for sample survey. Sankhya Ser C. 1975;37:117 132. 40. Binder DA. On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev. 1983;51:279 292. 41. Goldstein H. Multilevel Statistical Models. 3rd ed. London, England: Edward Arnold; 2003. 42. Research Triangle Institute. SUDAAN software program. Release 8.0.2. Research Triangle Park, NC: Research Triangle Institute; 2001. 43. Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons; 1980. 44. Ramsey JB. Tests for specification errors in classical linear least squares regression analysis. J R Stat Soc Ser B. 1969;31:350 371. 45. Hausman J. Specification tests in econometrics. Econometrica. 1978;46: 1251 1271. 46. Wooldridge JM. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: The MIT Press; 2002:291. 47. Patrick DL, Starks HE, Cain KC, et al. Measuring preferences for health states worse than death. Med Decis Making. 1994;14:9 18. 2005 Lippincott Williams & Wilkins 217
Shaw et al Medical Care Volume 43, Number 3, March 2005 APPENDIX 1 US Population-Based Predicted Preference Weights and Standard Errors for 243 EQ-5D Health States State Value SE State Value SE State Value SE 11111 1.000 0.000 11211 0.860 0.008 21111 0.854 0.008 11112 0.844 0.008 21211 0.843 0.009 11212 0.833 0.010 11121 0.827 0.008 21112 0.827 0.009 12111 0.825 0.008 11221 0.816 0.009 12211 0.814 0.008 21121 0.810 0.008 22111 0.808 0.009 11122 0.800 0.009 12112 0.797 0.008 21212 0.794 0.011 12121 0.781 0.007 21221 0.778 0.010 22211 0.775 0.010 11222 0.768 0.011 12212 0.765 0.009 21122 0.761 0.009 22112 0.759 0.010 12221 0.748 0.009 22121 0.742 0.008 12122 0.732 0.008 21222 0.708 0.013 22212 0.705 0.012 22221 0.689 0.012 12222 0.678 0.012 22122 0.672 0.012 11311 0.626 0.013 21311 0.619 0.011 11312 0.609 0.011 22222 0.597 0.019 11321 0.592 0.013 21312 0.592 0.011 12311 0.590 0.012 21321 0.575 0.012 22311 0.573 0.012 11322 0.565 0.012 12312 0.563 0.012 11113 0.550 0.015 11213 0.550 0.012 12321 0.546 0.013 21113 0.543 0.013 21213 0.533 0.011 13111 0.529 0.016 218 13211 0.529 0.013 21322 0.527 0.013 22312 0.524 0.013 23111 0.522 0.014 11123 0.517 0.015 12113 0.514 0.014 13112 0.512 0.014 23211 0.512 0.013 22321 0.508 0.014 11223 0.506 0.014 12213 0.503 0.013 13212 0.501 0.013 21123 0.499 0.014 12322 0.497 0.014 22113 0.497 0.014 13121 0.496 0.015 23112 0.495 0.014 13221 0.485 0.014 23121 0.478 0.014 12123 0.470 0.015 13122 0.468 0.015 21223 0.467 0.014 22213 0.465 0.013 11131 0.463 0.020 11231 0.463 0.017 23212 0.463 0.014 21131 0.456 0.017 11313 0.452 0.015 11132 0.446 0.018 23221 0.446 0.015 21231 0.446 0.016 21313 0.445 0.013 31111 0.442 0.016 31211 0.442 0.013 12223 0.438 0.015 22322 0.437 0.017 13222 0.436 0.015 11232 0.435 0.017 22123 0.432 0.015 13311 0.431 0.016 23122 0.430 0.015 21132 0.429 0.017 12131 0.427 0.019 31112 0.426 0.014 23311 0.424 0.014 11323 0.418 0.014 12231 0.416 0.017 12313 0.416 0.014 31212 0.415 0.013 13312 0.414 0.014 22131 0.410 0.018 31121 0.409 0.014 32111 0.407 0.014 21323 0.401 0.014 12132 0.400 0.018 22313 0.399 0.014 31221 0.398 0.013 13321 0.397 0.014 21232 0.397 0.018 23312 0.397 0.014 32211 0.396 0.013 31122 0.382 0.014 23321 0.380 0.014 32112 0.379 0.015 22223 0.378 0.018 22231 0.378 0.019 23222 0.376 0.017 12323 0.372 0.014 13322 0.370 0.014 12232 0.368 0.019 11331 0.365 0.018 32121 0.363 0.014 22132 0.361 0.019 21331 0.358 0.016 13113 0.355 0.019 13213 0.354 0.017 31222 0.350 0.015 23113 0.348 0.018 11332 0.348 0.016 32212 0.347 0.015 31311 0.344 0.016 23213 0.337 0.017 22323 0.333 0.015 23322 0.331 0.014 21332 0.331 0.015 32221 0.330 0.014 12331 0.329 0.017 31312 0.327 0.015 13123 0.321 0.018 32122 0.314 0.016 22331 0.312 0.016 31321 0.311 0.014 13223 0.310 0.017 32311 0.308 0.015 22232 0.308 0.022 23123 0.304 0.018 (Continued) 2005 Lippincott Williams & Wilkins
Medical Care Volume 43, Number 3, March 2005 Valuation of the EQ-5D Health States APPENDIX 1 (Continued) State Value SE State Value SE State Value SE 12332 0.302 0.016 11133 0.289 0.022 11233 0.289 0.019 13313 0.286 0.018 31322 0.283 0.014 21133 0.282 0.020 32312 0.281 0.015 23313 0.279 0.017 23223 0.272 0.017 21233 0.271 0.018 31113 0.268 0.019 13131 0.268 0.022 31213 0.268 0.017 13231 0.268 0.019 32321 0.264 0.014 22332 0.263 0.017 23131 0.261 0.020 32222 0.260 0.018 12133 0.253 0.021 13323 0.253 0.017 13132 0.251 0.020 23231 0.250 0.019 33111 0.247 0.019 33211 0.247 0.017 12233 0.242 0.019 13232 0.240 0.019 22133 0.236 0.020 23323 0.235 0.017 31123 0.235 0.018 23132 0.234 0.020 32113 0.232 0.018 33112 0.230 0.019 31223 0.224 0.017 SE indicates standard error. 32213 0.222 0.017 11333 0.220 0.019 33212 0.220 0.020 32322 0.216 0.015 21333 0.214 0.017 33121 0.214 0.017 22233 0.204 0.018 33221 0.203 0.016 23232 0.202 0.020 31313 0.199 0.018 13331 0.199 0.019 23331 0.193 0.018 32123 0.188 0.018 33122 0.186 0.018 12333 0.184 0.018 13332 0.182 0.017 31131 0.181 0.021 31231 0.181 0.018 33311 0.178 0.018 22333 0.167 0.017 31323 0.166 0.017 23332 0.165 0.017 31132 0.165 0.020 32313 0.164 0.017 33312 0.162 0.017 32223 0.156 0.017 33222 0.154 0.018 31232 0.154 0.019 32131 0.145 0.020 33321 0.145 0.016 32231 0.135 0.018 13133 0.123 0.024 13233 0.123 0.021 32323 0.120 0.016 32132 0.118 0.020 33322 0.118 0.016 23133 0.117 0.022 31331 0.112 0.019 23233 0.106 0.021 33113 0.102 0.022 33213 0.102 0.020 31332 0.096 0.017 32232 0.086 0.020 13333 0.084 0.018 23333 0.077 0.016 32331 0.077 0.017 33123 0.069 0.021 33313 0.063 0.017 33223 0.058 0.020 32332 0.049 0.017 31133 0.037 0.023 31233 0.036 0.020 33323 0.030 0.016 33131 0.016 0.023 33231 0.015 0.020 32133 0.001 0.022 33132 0.001 0.022 31333 0.003 0.017 32233 0.010 0.020 33232 0.012 0.021 33331 0.024 0.016 32333 0.038 0.016 33332 0.040 0.015 33133 0.100 0.021 33233 0.100 0.019 33333 0.109 0.012 2005 Lippincott Williams & Wilkins 219
Shaw et al Medical Care Volume 43, Number 3, March 2005 APPENDIX 2 Dimension US Population-Based EQ-5D Preference Weight Scoring System Coefficient Mobility Level 2 0.146 Level 3 0.558 Self-Care Level 2 0.175 Level 3 0.471 Usual Activities Level 2 0.140 Level 3 0.374 Pain/Discomfort Level 2 0.173 Level 3 0.537 Anxiety/Depression Level 2 0.156 Level 3 0.450 D1 0.140 I2-squared 0.011 I3 0.122 I3-squared 0.015 The following example demonstrates how the coefficients in the above table can be used to derive predicted values for EQ-5D health states. Health State 11223 Full health 1.000 Mobility: level 1 (subtract 0.000) Self-Care: level 1 (subtract 0.000) Usual Activities: level 2 (subtract 0.140) Pain/Discomfort: level 2 (subtract 0.173) Anxiety/Depression: level 3 (subtract 0.450) D1: number of dimensions at level 2 or 3 beyond first 2 (subtract 0.140 2 0.280) I2-squared: square of number of dimensions at level 2 beyond first 1 (subtract 0.011 1 0.011) I3: number of dimensions at level 3 beyond first 0 (subtract 0.122 0 0.000) I3-squared: square of number of dimensions at level 3 beyond first 0 (subtract 0.0148 0 0.000) Hence, the predicted value for state 11223 is 1.000 0.000 0.000 0.140 0.173 0.450 ( 0.280) 0.011 0.000 0.000 0.506 Note: The above example demonstrates the arithmetic needed to predict values for EQ-5D health states. To generate these values, the authors advocate using the information presented in Appendix 1. Alternatively, researchers may use scoring algorithms developed by the authors for several statistical applications (ie, SPSS, Stata, SAS), which are available from the Agency for Healthcare Research and Quality at http://www.ahrq.gov/rice/. 220 2005 Lippincott Williams & Wilkins