Journal of Clinical Epidemiology 62 (2009) 912e921 Items from patient-oriented instruments can be integrated into interval scales to operationalize categories of the International Classification of Functioning, Disability and Health Alarcos Cieza a,b, Roger Hilfiker b, Annelies Boonen c, Somnath Chatterji d, Nenad Kostanjsek e, Bedirhan T. Üstün e, Gerold Stucki a,b,f, * a ICF Research Branch of the WHO Collaborating Center for the Family of International Classifications at the German Institute of Medical Documentation and Information (DIMDI), IHRS, Ludwig-Maximilian University, Munich, Germany b Human Functioning Sciences Division, Swiss Paraplegic Research, Nottwil, Switzerland c Division of Rheumatology, Department of Internal Medicine, University Hospital Maastricht, Maastricht, The Netherlands d Department of Measurement and Health Information Systems, World Health Organization, Switzerland e Classification, Terminology and Standards Team, World Health Organization, Switzerland f Department of Physical Medicine and Rehabilitation, University Hospital Munich, Ludwig-Maximilian University, Munich, Germany Accepted 28 April 2008 Abstract Objective: To exemplify the construction of interval scales for specified categories of the International Classification of Functioning, Disability and Health (ICF) by integrating items from a variety of patient-oriented instruments. Study Design and Setting: Psychometric study using data from a convenience sample of 122 patients with rheumatoid arthritis. Patients completed six different patient-oriented instruments. The contents of the instrument items were linked to the ICF. Rasch analyses for ordered-response options were used to examine whether the instrument items addressing the ICF category b130: Energy and drive functions constitute a psychometrically sound interval scale. Results: Nineteen items were linked to b130: Energy and drive functions. Sixteen of the 19 items fit the Rasch model according to the chi-square (c 2 ) statistic (c 2 df532 5 38.25, P 5 0.21) and the Z-fit statistic (Z Mean 5 0.451, Z SD 5 1.085 and Z Mean 5 0.223, Z SD 5 1.132 for items and persons, respectively). The Person Separation Index r b was 0.93. Conclusion: The ICF category interval scales to operationalize single ICF categories can be constructed. The original format of the items included in the interval scales remains unchanged. This study represents a step forward in the operationalization and future implementation of the ICF. Ó 2009 Elsevier Inc. All rights reserved. Keywords: Outcome; assessment; Health; Psychometrics; Classification; Questionnaires; Fatigue 1. Introduction Functioning and disability are universal human experiences [1,2] that are at the core of medicine [3] and public health [4]. They are also of essential relevance in sectors, such as labor, education, and social affairs [5]. In medicine, the management of limitations in functioning complements medical and surgical care throughout the service continuum, from the acute to the community health care situation [6]. Improving or maintaining functioning or the prevention of disability is becoming one of the most * Corresponding author. Department of Physical Medicine and Rehabilitation, University Hospital Munich, Ludwig-Maximilian University, Marchioninistr. 15, 81377 Munich, Germany. Tel.: þ49-89-7095-4050; fax: þ49-89-7095-8836. E-mail address: Gerold.stucki@med.uni-muenchen.de (G. Stucki). urgent outcomes in public health [7]. In the labor, education, and social affairs sectors, planning and implementation of preventative actions are only viable if the needs of people experiencing, or likely to experience, disability are considered [5]. Accordingly, concepts, classifications, and measures of functioning and disability are of great interest and importance across professional disciplines and sectors. With the International Classification of Functioning, Disability and Health (ICF) [8], the WHO, for the first time, provides a universal and globally accepted framework and classification to describe the full range of human functioning and disability that may be affected by a health condition [9]. The ICF model identifies three components of the dimension functioning, namely body functions and structures, activities, and participation. Problems or difficulties in these components are called impairments, activity limitations, and participation 0895-4356/09/$ e see front matter Ó 2009 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2008.04.011
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 913 restrictions, that is, they are components of the dimension disability. Dimensions of functioning and of disability are both affected by interactions between health conditions and contextual factors (environmental and personal). The components of body functions and structures, activities and participation, and environmental factors (a list of personal factors awaits further research and development) are classified based on the ICF categories. The ICF contains a total of 1,424 categories that are mutually exclusive and organized within a hierarchically nested structure with up to four different levels. The ICF categories are denoted by unique alphanumeric codes with which it is possible to classify, measure, and describe functioning and disability, both on the individual and population levels. Because the ICF categories are always accompanied by a short definition and inclusions and exclusions, as appropriate, the information on the aspects of functioning can be reported unambiguously and compared based on ICF categories [10,11]. Examples of ICF categories, with their definitions, inclusions, and exclusions, can be found in Table A1 (available on the journal s website at www.else vier.com). An example of the hierarchically nested structure is presented in the following: b1 Mental functions (first/chapter level) b130 Energy and drive functions (second level) b1301 Motivation (third level) In principle, there are two approaches to measure a specified ICF category, that is, to quantify the extent of variation therein. The first is to use the so-called ICF qualifier as an expert rating scale ranging from 0 to 4 (Table A2 [available on the journal s website at www.elsevier.com]). With this approach, impairments, activity limitations, participation restrictions, and contextual factors are directly rated according to established coding guidelines [8]. However, as in any rating scale, the expert can access whatever sources of information are available [12]. The second approach is to use information obtained with a clinical test that includes standardized expert and technical examinations, or a patient-oriented instrument that includes patient- and proxy-reported, self-administered, or interview-administered questionnaires, and to transform this information into the ICF qualifier. In a first step, a clinical test or patient-oriented instrument is linked to the ICF based on established linking rules [13]. In a second step, the scores obtained with a clinical test or a patient-oriented instrument are transformed to the ICF qualifier. This second approach has the advantage that information already available can be transformed into the standard language of the ICF, to be understood by all interested professionals irrespective of their disciplines or the sectors (e.g., health, labor, or education) in which they are involved. The transformation to the ICF qualifier is straightforward when interval-scaled clinical tests or patient-oriented instruments, which comprehensively and uniquely cover the content of a respective ICF category, are readily available. For example, the visual analog scale (VAS) for the assessment of pain can be linked, in a first step, to the ICF category b280: Sensation of pain. In a second step, the values of the VAS-Pain can be transformed into an ICF qualifier in a straightforward manner, because it represents a 100-mm interval scale marked as no pain at one end and as worst pain at the other [14]. Considering the percentage values of the ICF qualifier of Table A2 (available on the journal s website at www.elsevier.com), a person marking a level of pain between 0 and 4 mm would receive qualifier 0 in the ICF category b280: Sensation of pain; between 5 and 24 mm, qualifier 1; between 25 and 49 mm, qualifier 2; between 50 and 95 mm, qualifier 3; and between 96 and 100 mm, qualifier 4. If there are no readily available clinical tests or patientoriented instruments, a third approach can be developed. One may consider using parts of clinical test batteries or selected items of patient-oriented instruments that cover a specified ICF category. Thus, an ICF category interval scale can be constructed to serve as an interface between the clinical test or patient-oriented instrument and the ICF qualifier. Established linkage rules [13,15] can be used to identify suitable parts of clinical tests and items of patient-oriented instruments. Rasch or Item Response Theory (IRT) models [16] can be applied to construct interval scales. However, the complete process of how to develop ICF category interval scales and how they can serve as an interface have not been described to date. The objective of this article was to exemplify the construction of interval scales for specified ICF categories by integrating items from a variety of widely used patientoriented instruments, which were filled in by a convenience sample of rheumatoid arthritis (RA) patients. The specific aims are to: 1. identify candidate items from a range of patientoriented instruments that address specific ICF categories, and 2. estimate the extent to which selected items that address the ICF category Energy and drive functions form a unidimensional, ordered interval scale. 2. Materials and methods 2.1. Study design The psychometric study used data from a convenience sample of patients with RA that was collected in a crosssectional study conducted at the University Hospital Maastricht, The Netherlands. Patients were asked to fill in a number of patient-oriented instruments for reasons other than this psychometric study. The study protocol and informed consent forms of the cross-sectional study were approved by the Ethics Committee
914 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 of the University Hospital in Maastricht. Inclusion criteria for patients were: a diagnosis of RA according to the revised American College of Rheumatology (ACR) criteria, at least 18 years of age, sufficient knowledge of the Dutch language, and signed, informed consent. 2.2. Measures Patients completed the following patient-oriented instruments: the Rheumatoid Arthritis Quality of Life (RAQoL [17]) Questionnaire, the Health Assessment Questionnaire (HAQ [18]), the Medical Outcomes Study Short Form-36 (SF-36 [19]), the European Quality of Life Instrument (EQ-5D [20]), the Multidimensional Fatigue Inventory (MFI [21]), and the Center for Epidemiological Studies Depression Scale (CES-D [22]). 2.3. Identification of candidate items To identify the candidate items for integration into an ICF category interval scale, the contents of the items from the instruments as completed by patients were independently linked to the ICF by two health professionals using established linkage rules [13,15]. According to these rules, all concepts contained in an item will be identified and linked to the most precise ICF categories. For example, in the item It s too much effort to go out and see people of the RAQoL, two different concepts were identified, namely it s too much effort and go out and see people. They were linked to the ICF categories b1301: Motivation, defined as mental functions that produce the incentive to act; the conscious or unconscious driving force of action and to d9205: Socializing, defined as engaging in informal or casual gatherings with others, such as visiting friends or relatives or meeting informally in public places. The SF- 36 and the EQ-5D had already been linked in another study [23]. However, they were linked again for this investigation by the two health professionals who linked all other patientoriented instruments. Where disagreement regarding the identified ICF categories arose, a third person (A.C.) made an informed decision. To evaluate the reliability of the linkage process, the percentage of observed agreement was calculated based on the two independent linkage versions of the patient-oriented instruments. In addition, the kappa coefficients [24] and nonparametric bootstrapped confidence intervals (CIs) [25,26] were also calculated to investigate if there was a greater agreement than might occur by chance. The reliability of the linkage process was studied for all different levels of the ICF, that is, component first, second, and third levels. The number of items linked to the different second-level ICF categories were counted, which means that when thirdor fourth-level categories had been selected in the linking process, the second-level category was considered. The second-level ICF category to which most instrument items could be linked was selected as an exemplary ICF category to demonstrate the construction of an ICF category interval scale. 2.4. Validation of candidate items The validation of candidate items for the selected ICF category as identified in the content linkage was based on the Rasch model for ordered-response options, which is an extension of the original dichotomous model amplified to account for ordinal scaleetype data [27]. The candidate items were expected to fulfill the criteria for unidimensionality according to the Rasch model, which is described in detail in the following section. 2.5. Construction of an International Classification of Functioning, Disability and Health category interval scale The integration of the candidate items into an interval scale for the selected ICF category was also based on the Rasch model for ordered-response options. The Rasch model implies that the parameters of a person s ability and an item s difficulty are both placed along one and the same single dimension, which indicates the latent trait to be measured. The units of this dimension, as defined by the model, are logits (the natural log odds of success vs. failure), which make up an equidistant scale. The following properties were studied: unidimensionality and reliability of the scale, item fit, functioning of the response options of the single items, and the targeting between the items and the person s abilities. The unidimensionality of the suitable items was checked by the itemetrait interaction chi-square (c 2 ) [28] and the Z-fit statistics [29]. A significant c 2 probability of less than 0.05 indicates misfit of the scale to the model. Reliability was studied with the Person Separation Index r ß, which is analogous to the traditional test theory indices Kuder-Richardson Formula 21 or Cronbach s alpha and ranges between 0 and 1, where the value of 1 indicates perfect reproducibility of person placements [27,30]. The number of distinct ability strata H p that can be reliably identified by the scale scores was calculated by the formula H p 5 (4G p þ 1)/3 [31], where G p is a measure of the sample standard deviation expressed in standard error units. The test of fit of the individual items was also conducted based on c 2 statistics. Additionally, the standardized fit residual values of z are considered. They should range between 2.5 and þ2.5 to indicate model fit [32]. These values provide information about the direction of the deviation of the observed item data from the model. The functioning of the response options was studied based on the threshold estimates for each ICF category. The threshold corresponds to the location on the latent continuum at which it is equally likely that a person will be classified into adjacent response options, and therefore, to obtain one of two successive scores. The number of thresholds equals the
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 915 number of response options minus one, and they should have increasing values, because disordered threshold estimates indicate a failure to construct a measure in which successive scores reflect increasing levels of the dimension being measured [33,34]. When threshold values are disordered, the response options are collapsed, that is, the item is rescaled, taking into consideration frequency distributions and their probability curves. After rescaling the item, the thresholds should display the intended increasing order. Misfit of items and disordered-response option thresholds question the validity of all further conclusions, including the unidimensionality of the items. Therefore, the data have to be iteratively revised, the model recalibrated, and unidimensionality and item fit checked until itemetrait interaction c 2 does not present a significant result. We revised the data in two steps: by collapsing response options and stepwise deletion of items with the smallest c 2 probabilities. The targeting of the items was studied by examining the respective distribution of persons abilities and items difficulties along the latent-trait continuum. Comparison of the mean person location with the mean item location, which is set to 0 by definition, indicates domain targeting. The smaller the difference, the better the targeting. The logit scale with the arbitrary mean of 0 was transformed to a more meaningful scale, which we called the ICF category interval scale, ranging from 0 to 100 using the following formula: Y 5 m þðslocationþ where s 5 (desired range)/(current range) and m 5 (lowest desired value) (current lowest value s) [35]. The thresholds of the response options of each item were also transformed to the ICF category interval scale from 0 to 100. To illustrate how the patients scores on the ICF category interval scale can be used to estimate the value on the ICF qualifier (from 0 to 4), the response options of the ICF qualifier were positioned in the ICF category interval scale. Qualifier 0 corresponds to a score of up to 5 on the ICF category interval scale; qualifier 1, up to 25; qualifier 2, up to 50; qualifier 3, up to 95; and qualifier 4, up to 100. 2.6. Statistical programs The descriptive statistics were performed with SPSS Version 14.0 (SPSS Inc, Chicago, IL [36]). The kappa analyses were performed with SAS 9.0 (SAS Institute Inc, Cary, NC [37]). Rasch analyses were conducted using RUMM2020 Software (RUMM Laboratory, Perth, Western Australia [35]). 3. Results 3.1. Study population The demographic and RA-related characteristics of the convenience sample of 122 patients are shown in Table 1. 3.2. Identification of candidate items Table 2 shows the results of the evaluation of the linkage procedure as the percentage of observed agreement, kappa statistics, and bootstrapped CIs for all different levels of the ICF. None of the 95% CIs encloses 0, indicating that the agreement exceeded chance. The second-level ICF category b130: Energy and drive functions was the ICF category to which the largest number of items were linked, that is, a total of 19 items. It was, therefore, the ICF category selected to exemplify the construction of an ICF category interval scale. The definition of b130: Energy and drive functions is presented in Table A1 (available on the journal s website at www.elsevier. com). The 19 items, with their corresponding response options, are presented in Table 3. The notion reflected in the ICF category b130: Energy and drive functions, according to the ICF, is a neutral notion. However, based on the selected items and their corresponding response options, as presented in Table 3, the addressed notion is impairment in energy and drive functions, which is also the concept denoted by the ICF qualifier, namely the extent or magnitude of the impairment (see also Table A2 [available on the journal s website at www.elsevier.com]). 3.3. Validation of candidate items and construction of the International Classification of Functioning, Disability and Health category interval scale The overall-fit statistics of the 19 items according to the c 2 statistic showed a significant value, indicating that not Table 1 Demographic and disease characteristics of 122 patients with RA included in this study Patient characteristics Sociodemographic data Male patients; n (%) 36 (29.5) Age, yr; mean (SD) 58.6 (15.4) Current work status Paid employment; % 12.3 Unemployed (because of RA); % 20.5 Unemployed (because of some other reason); % 2.5 Keeping house/homemaker; % 23.8 Retired; % 33.6 Disease characteristics Duration of disease, yr; mean (SD) 13.6 (11.5) Number of comorbidities a ; mean (SD); median (0e11) 4.3 (1.98); 4 Rheumatoid Arthritis Disease Activity Index 3.7 (2.1) (RADAI [38]) (0e10); mean (SD) Abbreviations: RA, rheumatoid arthritis; SD, standard deviation. a Number of comorbidities according to the Self-Administered Comorbidity Questionnaire (SCQ [39]): 10 patients had heart disease, 32 had high blood pressure, 11 had lung disease, 8 had diabetes, 12 had ulcers or stomach disease, 3 had kidney disease, 2 had liver disease, 4 had anemia or other blood disease, 3 had cancer, 39 had osteoarthritis, and 40 had back pain. In addition, 81 patients indicated that they had some other disease not explicitly mentioned in the SCQ.
916 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 Table 2 Percentage of observed agreement, estimated kappa coefficients, and nonparametric bootstrapped 95% confidence intervals at the component, chapter first, second, and third levels of the International Classification of Functioning, Disability and Health Observed agreement (%) Estimated kappa coefficients 95% Bootstrapped confidence intervals Component 92 0.90 0.83, 0.94 Chapter 1st Level 98 0.98 0.95, 0.99 2nd Level 88 0.85 0.80, 0.91 3rd Level 83 0.80 0.57, 0.90 all the items measured the dimension Energy and drive functions as expected by the Rasch model (c 2 df538 5 69.401, P! 0.01). A high variability of the Z-fit statistic was also found (Z Mean 5 0.32, Z SD 5 1.22 and Z Mean 5 0.15, Z SD 5 1.24 for items and persons, respectively). The Person Separation Index r b was 0.94, indicating high reliability. According to the fit of the individual items to the model, three items showed significant misfit according to c 2 probabilities (c 2 df52 5 8.78, P 5 0.01; c 2 df52 5 8.78, P 5 0.01; and c 2 df52 5 6.49, P 5 0.04). Four items showed disordered threshold parameters. Therefore, the response options of these four items were collapsed, and model fit and unidimensionality were checked again. The collapsing strategy for each of the four items presenting disordered-response options is also presented in Table 3. After collapsing the response options, only one of the three misfitting items still showed significant misfit ( I have to go to bed earlier than I would like to [RAQoL 1]). However, two other items that were not originally misfitting presented a significant misfit ( I did not feel like eating [CES-D 2] and Physically, I feel I am in bad condition [MFI 14]). All three of these last, misfitting items were deleted stepwise from the analyses. The overall-fit statistics according to the c 2 statistic of the remaining 16 suitable items did not show a significant value (c 2 df532 5 38.25, P 5 0.21). The results of the Z-fit statistic were Z Mean 5 0.451, Z SD 5 1.085 and Z Mean 5 0.223, Z SD 5 1.132 for items and persons, respectively. The Person Separation Index r b was now 0.93. Five distinct ability strata can be reliably identified based on the scores of the 16-item scale. The fit of the individual items and the estimated threshold parameters of the model before and after accounting for disordered thresholds and item misfit are presented in Table A3 (available on the journal s website at www.elsevier.com). Figure A1 (available on the journal s website at www.el sevier.com) shows the distribution of persons and items along the measurement continuum. The mean person location was 0.15, which, compared with the mean item location 0.0, reveals appropriate targeting. The response options of the items span over 8.9 logits, covering a large proportion of RA patients presenting moderate to severe problems in energy and drive functions. Figure 1 represents the ICF category interval scale of the continuum impairment in energy and drive with values ranging from 0 to 100. The items, with their corresponding descriptions, are represented in the rows of the figure. The value beside the description represents the average value for that item across all thresholds and refers to the item location or level of difficulty of the item. The thresholds correspond to the values at which the gray tones of the rows change. The two easiest items are Did you feel tired? (SF-36 9i) and I feel I am in excellent condition (MFI 20). Here, the term easiest means that a person with even a very low level of impairment in energy and drive would endorse that item. In other words, easy items allow for a differentiation of persons with no to minor impairment in Energy and drive functions. The two most difficult items are I could not get going (CES-D 20) and It s too much effort to go out and see people (RAQoL 25). The term difficult means that only persons with a very high level of impairment in energy and drive would endorse that item. In other words, difficult items are those which differentiate the persons with moderate to severe impairment in energy and drive functions. Difficult items usually have a positive logit value, and easy items have a negative logit value. Based on a concrete example, Fig. 1 can be explained as follows: Did you feel tired? (SF-36 9i), the easiest item, was rescaled (i.e., its response options collapsed) from the original six response options to three response options (1 5 none of the time, 2 5 some time, 3 5 all of the time). After collapsing the response options, the item has two thresholds (threshold 1 5 7.9 and threshold 2 5 78.8), which correspond to the values at which the gray tone of the lowest row of Fig. 1 changes. It is the easiest item because even a person with a very low level of impairment in energy and drive (7.9% impairment) would endorse that item (i.e., select the response option 2 5 some time). However, only persons with a high degree of impairment in energy and drive, almost 80% impairment, would select the next response option 3 5 all of the time (threshold 2 5 78.8). The position of each of the response options of the ICF qualifier in the ICF category interval scale is indicated by vertical arrows in Fig. 1. All persons obtaining a score of up to 5 on the ICF category interval scale would be assigned qualifier 0 in the ICF category b130: Energy and drive functions. Those persons obtaining a score between 5 and 25 would be assigned qualifier 1; those with a score between 25 and 50, qualifier 2; those with a score between 50 and 95, qualifier 3; and those with a score between 95 and 100, qualifier 4. The raw scores that can be obtained by adding the answers to the 16 items were transformed to the logit scale, to the ICF category interval scale, and to the ICF qualifier. A part of the transformation table is presented in Table 4. For example, if a person obtains a raw score of 4, his or her position on the logit scale would be 2.94, his or her score on the ICF category interval scale would be 29,
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 917 Table 3 The 19 items linked to the ICF category b130: Energy and drive functions, the questionnaire from which they proceeded with their instructions, their response options, and the collapsing strategy followed when presenting disordered-response thresholds Items Questionnaire Response options Collapsing strategy Instructions for the MFI 20: By means of the following statements we would like to get an idea of how you have been feeling lately. There is, for example, the statement I FEEL RELAXED. If you think that this is entirely true, that indeed you have been feeling relaxed lately, please, place an X in the extreme left box. The more you disagree with the statement, the more you can place an X in the direction no, that is not true. Please do not miss out a statement and place one X next to each statement. I feel fit. MFI 20 From 0 5 yes, that is true to 4 5 no, that is not true I feel very active. MFI 20 " 01123 I feel tired. MFI 20 " I am rested. MFI 20 " Physically, I feel only able to do a little. MFI 20 " 32210 Physically, I can take on a lot. MFI 20 " Physically, I feel I am in bad condition. MFI 20 " 21110 I tire easily. MFI 20 " Physically, I feel I am in excellent condition. MFI 20 " Instructions of the CES-D: Below is a list of the ways you might have felt or behaved. Please tell me how often you have felt this way during the past week. I did not feel like eating; my appetite was poor. CES-D 0 5 RARELY or NONE of the time. (Less than 1 day) 1 5 SOME or a LITTLE of the time. (1e2 days) 2 5 OCCASIONALLY or a MODERATE amount of the time. (3e4 days) 3 5 MOST or ALL of the time. (5e7 days) I felt that everything I did was an effort. CES-D " I could not get going. CES-D " Instructions for the SF-36: These questions are about how you feel and how things have been with you during the past 4 weeks. For each question, please give the one answer that comes closest to the way you have been feeling. How much of the time during the past 4 weeks (circle one number on each line) Did you have a lot of energy? SF-36 1 5 All of the Time 2 5 Most of the Time 3 5 A Good Bit of the Time 4 5 Some of the Time 5 5 A Little of the Time 6 5 None of the Time Did you feel worn out? SF-36 " Did you feel tired? SF-36 " 322221 Instructions for the RAQoL: Below you will find some statements which have been made by people who have rheumatoid arthritis. Please read each statement carefully. We would like you to tick yes if you feel the statement applies to you, and tick no if it does not. I have to go to bed earlier than I would like to. RAQoL 0 5 No 1 5 Yes It s too much effort to go out and see RAQoL " people. I have to keep stopping what I am doing, RAQoL " to rest. I feel tired whatever I do. RAQoL " Abbreviations: ICF, International Classification of Functioning, Disability and Health; MFI, Multidimensional Fatigue Inventory; CES-D, Center for Epidemiological Studies Depression Scale; SF-36, Short Form-36; RAQoL, Rheumatoid Arthritis Quality of Life. and s/he would be assigned the qualifier of 2, that is, moderate problem in Energy and drive functions. 4. Discussion We have illustrated how the value on the ICF qualifier can be estimated based on the interval scales developed for specified ICF categories by integrating the items from patient-oriented instruments. The original format of the items used to construct the ICF category interval scale remained unchanged. Thus, it is possible to use the information provided by items within the context of their original instruments and, at the same time, within the context of the ICF. This application can be extremely useful, given the increasing use of the ICF and the ICF qualifier as references when documenting and reporting functioning and disability [40,41]. The development of ICF category interval scales as illustrated for Energy and drive functions can be applied to any ICF category. The construction of ICF category interval scales relies on linking instrument items to ICF categories [13,15]. The accuracy of the linkage procedure can be examined with kappa statistics to assess the reliability between the two raters [24]. Rasch analyses, as illustrated in this article, complement kappa statistics and can contribute to the study of whether the content linkage is an appropriate method to identify items addressing the same ICF category. The fact that 16 of the 19 items identified from the content linkage fitted the Rasch model, that is,
918 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 Fig. 1. The x and the y axes represent the International Classification of Functioning, Disability and Health (ICF) category interval scale of the continuum energy and drive, with values ranging from 0 to 100. Not all values from 0 to 100 are represented on the y axis because of space constraints. The 16 items in order of difficulty from the easiest item (bottom) to the most difficult item (top) are presented on the y axis. The value corresponding to the position of the items is presented next to them. The position of the thresholds of the response options of the items are represented by the bars in the diagram. The different gray tones represent the different response options for each individual item. The vertical arrows represent the position of each of the response options of the ICF qualifier. represented one dimension, supports the assumption that all of these 16 items address the content of the selected ICF category of Energy and drive functions. The three misfitting items that were consequently deleted were I have to go to bed earlier than I would like to (RAQoL 1), I did not feel like eating (CES-D 2), and Physically, I feel I am in bad condition (MFI 14). One could argue that all three items refer to an activity or to feelings, respectively, which are not necessarily directly related to the level of energy and drive. In other words, the fact that one goes to bed earlier than one would like to can be related with the fact that one has to go to work very early in the morning. Furthermore, the fact that one does not feel like eating could be associated more with worries than with the energy level. Regarding the third deleted item, one could also intuitively say that there are many people who know they are in bad physical condition but who do not feel low levels of energy and drive related to it. Factor analysis could also have been performed to study whether the items address a common, single dimension. However, we opted for Rasch analyses to obtain not only information on unidimensionality but also on additional properties of the items, such as performance of the response options, targeting, and items difficulty. We evaluated the linkage process by calculating kappa coefficients, which showed satisfactory results for linker agreement. Kappa is an often used indicator of agreement that accounts for chance. One can argue that unsystematic error because of chance appears to be of secondary relevance for the linkage procedure. Therefore, modeling methods, such as many-facet [42], latent-class [43], and latent-trait [44] analyses, would be useful in the future to explain any disagreement between the linkers (e.g., owing to experience or profession). The overall-fit statistics and the fit of the individual items to the model support the construct validity of the created ICF category interval scale for Energy and drive functions. The high Person Separation Reliability Index of 0.94 shows that high precision of measurement can be achieved with its use. Thus, using the scores obtained with it, persons can be reliably distinguished into at least five separate strata in the dimension Energy and drive functions. Only four items presented disordered thresholds. Reversal of thresholds occurs when persons choices of response options are not in accordance with the expectations from their estimated Energy and drive functions level. This Table 4 Part of the conversion table from raw scores to logit scale, to the ICF category interval scale and to the ICF qualifier Raw score (16 items) Logit scale ICF category interval scale (0e100) ICF qualifier 0 6.36 0 0 1 4.72 14 1 2 3.81 21 3 3.29 26 2 4 2.94 29... 21 0.54 49 22 0.45 49 23 0.35 50 3 24 0.25 51... 49 4.1 88 50 4.72 93 51 5.59 100 4 The raw scores are calculated by adding the responses of the RA patients to the 16 items. The complete table can be obtained from the authors on request.
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 919 might be attributed to the perceived ambiguity of the response options or the narrow range of the response options [33]. Interestingly, three of the four items presenting disordered thresholds were from the MFI, in which the answers are not specified, but are defined by anchors from yes, that is true to no, that is not true. It would be worthwhile to study the psychometric properties of this instrument using Rasch or other IRT models to investigate the extent to which the response options perform as expected. The fourth item showing disordered threshold was from the SF-36. The psychometric properties of the SF-36 using Rasch analyses and IRT models have been studied in different investigations. The items addressing physical health have received special attention [45e48]. However, to our knowledge, no investigations have studied the functioning of the response options with regard to the items included in the subscale of vitality. The created ICF category interval scale also proved to be well targeted for a large proportion of RA patients presenting moderate to severe problems in Energy and drive functions. However, it also displayed floor effects, as RA patients with a low level of impairment in energy and drive at the lower end of the measurement continuum could not be covered by the items included in the scale. The developed ICF category interval scale may be considered a starting point for the development of a self-reported assessment instrument for Energy and drive functions.however, it is important to emphasize that the purpose of this article is to illustrate the methodology and not to create a new instrument. We are aware that, for the construction of a universally applicable interval scale for Energy and drive functions, a number of issues need to be addressed. First, the selection of items to be included in the scale should be done before performing the study (and not a posteriori, as in this study), taking into consideration a large number of instruments, including generic-, condition-, and domain-specific instruments [49]. This would facilitate the inclusion of easy items in the lower part of the continuum to appropriately represent the whole spectrum of variation encountered in energy and drive functions. Studies already published on the content comparison of health-status measures with the ICF could be a very valuable source of information to achieve this aim [23,50e52]. Researchers embarking on the construction of ICF category interval scales to measure specified ICF categories may also use the results of the National Institute of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) initiative, which develops, validates, and standardizes item banks to measure patientreported outcomes (PROs) [53,54]. In the future, PROMIS will have the most comprehensive source of candidate items. Second, a larger sample size would be necessary to perform the analyses, especially when including additional items. Even though there are no standard criteria to define how large the sample should be to obtain usefully stable item calibrations, the estimation by Linacre (2002) [33] provides a frame of reference. According to it, at least 10 observations per response option and item are needed. In addition, the personeitem deviation residuals should be examined by principal components analysis (PCA) [55] to assure that the assumptions of local independence hold. The independence of personeitem deviation residuals, taken with adequate fit to the Rasch model, support unidimensionality. Third, data from different countries would be necessary to evaluate the cross-cultural validity of the measure. As the ICF was developed for international application [56], this consideration is of special relevance. Fourth, additional analyses should also be performed from the perspective of the Traditional Test Theory (TTT) [57], such as factor analysis and analyses to study the convergent and discriminant validity. Last, but not least, if this methodology is used to create new interval scales for ICF categories, special attention must be paid to the content and level of difficulty of the items. To avoid redundancy, items with similar content and level of difficulty should not be included in the same interval scale. The use of the methodology presented in this article, as a starting point for the construction of new instruments, is of special interest for the ICF categories for which no instrument exists which comprehensively and uniquely covers the content of a respective ICF category. Instruments developed based on the ICF category interval scales have the immediate advantages that the scores obtained are intuitive, for example, from 0 to 100, and that the ICF qualifier can be estimated based on them. In conclusion, this study demonstrates how items from different patient-oriented instruments can be integrated into a psychometrically sound ICF category interval scale to operationalize single ICF categories. It also illustrates how patients scores on this scale can be easily transformed to the response options of the ICF qualifier. It represents a step forward for the operationalization and future implementation of the ICF. Acknowledgments The authors thank Heinrich Gall, Alicia Garza, Andrea Glässel, and Michaela Kirschneck for their support in conducting this study, and Pieter Lozekoot and Jos Ramaker for collecting the data. This study was partially supported by a grant from the European League Against Rheumatism (EULAR). References [1] Zola IK. Towards the necessary universalizing of disability policy. Part 2: disability policy: restoring socioeconomic independence. Milbank Q 1989;67(Suppl 2):401e28. [2] Bickenbach JE, Chatterji S, Bradley EM, Üstün TB. Models of disablement, universalism and the international classification of impairments, disabilities and handicaps. Soc Sci Med 1999;48:1173e87. [3] Cieza A, Stucki G. New approaches to understanding the impact of musculoskeletal conditions. Best Pract Res Clin Rheumatol 2004;18:141e54.
920 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 [4] Lollar DJ. Public health and disability: emerging opportunities. Public Health Rep 2002;117:131e6. [5] Stucki G. International Classification of Functioning, Disability, and Health (ICF): a promising framework and classification for rehabilitation medicine. Am J Phys Med Rehabil 2005;84:733e40. [6] Stucki G, Stier-Jarmer M, Grill E, Melvin J. Rationale and principles of early rehabilitation care after an acute injury or illness. Disabil Rehabil 2005;27:353e9. [7] Stucki G, Cieza A, Melvin J. The International Classification of Functioning, Disability and Health (ICF): a unifying model for the conceptual description of the rehabilitation strategy. J Rehabil Med 2007;39:279e85. [8] World Health Organization. International Classification of Functioning, Disability and Health: ICF. Geneva: WHO; 2001. [9] Stucki G, Ewert T, Cieza A. Value and application of the ICF in rehabilitation medicine. Disabil Rehabil 2002;24:932e8. [10] Finger ME, Cieza A, Stoll J, Stucki G, Huber EO. Identification of intervention categories for physical therapy, based on the international classification of functioning, disability and health: a Delphi exercise. Phys Ther 2006;86:1203e20. [11] Rentsch HP, Bucher P, Dommen-Nyffeler I, Wolf C, Hefti H, Fluri E, et al. The implementation of the International Classification of Functioning, Disability and Health (ICF) in daily practice of neurorehabilitation: an interdisciplinary project at the Kantonsspital of Lucerne, Switzerland. Disabil Rehabil 2003;25:411e21. [12] McDowell I. Measuring health: a guide to rating scales and questionnaires. 3rd edition. New York: Oxford University Press; 2006. [13] Cieza A, Geyh S, Chatterji S, Kostanjsek N, Ustün B, Stucki G. ICF linking rules: an update based on lessons learned. J Rehabil Med 2005;37:212e8. [14] Wallerstein SL. Scaling clinical pain and pain relief. In: Bromm B, editor. Pain measurement in man: neurophysiological correlates of pain. New York: Elsevier; 1984. [15] Cieza A, Brockow T, Ewert T, Amman E, Kollerits B, Chatterji S, et al. Linking health-status measurements to the international classification of functioning, disability and health. J Rehabil Med 2002;34: 205e10. [16] Andrich D. Controversy and the Rasch model: a characteristic of incompatible paradigms? Med Care 2004;42(1 Suppl):I7e16. [17] Tijhuis GJ, de Jong Z, Zwinderman AH, Zuijderduin WM, Jansen LM, Hazes JM, et al. The validity of the Rheumatoid Arthritis Quality of Life (RAQoL) questionnaire. Rheumatology 2001;40: 1112e9. [18] Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum 1980;23:137e45. [19] Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). A. Conceptual framework and item selection. Med Care 1992;30:473e83. [20] The Euroqol Group. Euroqolda facility for the measurement of health-related quality of life. Health Policy 1990;16:199e208. [21] Smets EM, Garssen B, Bonke B, De Haes JC. The Multidimensional Fatigue Inventory (MFI) psychometric qualities of an instrument to assess fatigue. J Psychosom Res 1995;39:315e25. [22] Center for Epidemiologic Studies, National Institute of Mental Health. Center for Epidemiologic Studies Depression Scale (CES-D). Rockville, MD: National Institute of Mental Health; 1971. [23] Cieza A, Stucki G. Content comparison of health-related quality of life (HRQOL) instruments based on the international classification of functioning, disability and health (ICF). Qual Life Res. 2005;14:1225e37. [24] Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37e46. [25] Efron B. The Jackknife, the bootstrap and other resampling plans. Philadelphia: SIAM; 1982. [26] Vierkant RA. A SAS macro for calculating bootstrapped confidence intervals about a Kappa coefficient. Available at. In: SAS Users Group International Online Proceedings;. http://www2.sas.com/proceedings/sugi22/stats/paper295.pdf. Accessed July 23, 2004. [27] Andrich D. Application of a psychometric rating model to ordered categories, which are scored with successive integers. Appl Psychol Meas 1978;2:581e94. [28] Andrich D. Rasch models for measurement. In: Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-068. Newbury Park, CA: Sage; 1988. [29] Styles I, Andrich D. Linking the standard and advanced forms of the Raven s progressive matrices in both the pencil-and-paper and computer-adaptive-testing formats. Educ Psychol Meas 1993;53: 905e25. [30] Andrich D. An index of person separation in latent trait theory, the traditional KR.20 index, and the Guttman scale response pattern. Educ Res Perspect 1982;9:95e104. [31] Fisher WP. Reliability statistics. Rasch Meas Trans 1992;6:238. [32] Wright BD, Masters GN. Rating scale analysis. Chicago: MESA; 1982. [33] Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas 2002;3:85e106. [34] Andrich D. The Rasch model explained. In: Alagumalai S, Durtis DD, Hungi N, editors. Applied Rasch measurement: a book of exemplars. Dordrecht, The Netherlands: Springer-Kluwer; 2005. p. 308e28. Chapter 3. [35] Andrich D, Sheridan BS, Luo G. RUMM2020: Rasch unidimensional models for measurement. Perth, Western Australia: RUMM Laboratory; 2002. [36] SPSS Inc. SPSS Release 14.0.2. Chicago, Illinois; 2006. [37] SAS Institute Inc. The SAS System for Windows, Version 8.2. Cary, NC: SAS Institute Inc; 2001. [38] Stucki G, Liang MH, Stucki S, Brühlmann P, Michel BA. A self-administered rheumatoid arthritis disease activity index (RA- DAI) for epidemiologic research. Arthritis Rheum 1995;38:795e8. [39] Sangha O, Stucki G, Liang MH, Fossel AH, Katz JN. The Self-Administered Comorbidity Questionnaire: a new method to assess comorbidity for clinical and health services research. Arthritis Rheum 2003;49:156e63. [40] Stucki G, Melvin J. The International Classification of Functioning, Disability and Health: a unifying model for the conceptual description of physical and rehabilitation medicine. J Rehabil Med 2007;39:286e92. [41] Jette AM. Toward a common language for function, disability, and health. Phys Ther 2006;86:726e34. [42] Linacre JM. Many-Facet Rasch measurement. Chicago: MESA Press; 1992. [43] Dillon WR, Mulani N. A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behav Res 1984;19: 438e58. [44] Uebersax JS, Grove WM. A latent trait finite mixture model for the analysis of rating agreement. Biometrics 1993;49:823e35. [45] Stucki G, Daltroy L, Katz JN, Johannesson M, Liang MH. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol 1996;49:711e7. [46] Haley SM, McHorney CA, Ware JE. Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale. J Clin Epidemiol 1994;47:671e84. [47] McHorney CA, Haley SM, Ware JE. Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods. J Clin Epidemiol 1997;50:451e61. [48] Taylor WJ, McPherson KM. Using Rasch analysis to compare the psychometric properties of the Short Form 36 physical function score and the Health Assessment Questionnaire disability index in patients with psoriatic arthritis and rheumatoid arthritis. Arthritis Rheum 2007;57:723e9. [49] Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life. Ann Intern Med 1993;118:622e9. [50] Geyh S, Cieza A, Kollerits B, Grimby G, Stucki G. Content comparison of health-related quality of life measures used in stroke based on the
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 921 international classification of functioning, disability and health (ICF): a systematic review. Qual Life Res 2007;16:833e51. Epub February 10, 2007. [51] Stucki A, Stucki G, Cieza A, Schuurmans MM, Kostanjsek N, Ruof J. Content comparison of health-related quality of life instruments for COPD. Respir Med 2007;101:1113e22. Epub January 9, 2007. [52] Grill E, Stucki G, Scheuringer M, Melvin J. Validation of International Classification of Functioning, Disability, and Health (ICF) Core Sets for early postacute rehabilitation facilities: comparisons with three other functional measures. Am J Phys Med Rehabil 2006;85:640e9. [53] Fries JF, Bruce B, Cella D. The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clin Exp Rheumatol 2005;23(5 Suppl 39):S53e7. [54] Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, et alpromis Cooperative Group. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care 2007;45(5 Suppl 1):S3eS11. [55] Smith EVJ. Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. J Appl Meas 2002;3:205e31. [56] World Health Organization. Towards a common language for functioning, Disability and Health, ICF. Geneva: World Health Organization; 2002. [57] Nunnally JC, Bernstein I. Psychometric theory. 3rd edition. New York: McGraw Hill; 1994.
921.e1 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 Appendix RAQoL 25: It s too much effort to go out and see people CES-D 20: I could not get going CES_D 07: I felt that everything I did was an effort SF-36 9g: Did you feel worn out? MFI 2: Physically, I feel only able to do a little SF-36 9e: Did you have a lot of energy? MFI 3: I feel very active MFI 5: I feel tired MFI 8: Physicaly, I can take a lot MFI 1: I feel fit MFI 16: I tire easy MFI 12: I am rested RAQoL 10: I have to keep stopping what I am doing, to rest 67.7 62.9 62.1 60.6 56.1 54.8 52.6 52.6 52.0 50.8 48.1 48.1 47.9 4% 24% 49% 95% RAQoL 21: I feel tired whatever I 46.4 do MFI 20: Physically, I feel I am in a 45.5 excellent condition SF-36 9i: Did you feel tired? 43.3 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Fig. A1. Personeitem threshold distribution.
A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921 921.e2 Table A1 Examples of ICF categories with their corresponding code, title, and definition Code a and title Definition b130: Energy and drive functions General mental functions of physiological and psychological mechanisms that cause the individual to move toward satisfying specific needs and general goals in a persistent manner Inclusions: functions of energy level, motivation, appetite, craving (including craving for substances that can be abused), and impulse control Exclusions: consciousness functions (b110), temperament and personality functions (b126), sleep functions (b134), psychomotor functions (b147), emotional functions (b152) b280: Sensation of pain Sensation of unpleasant feeling indicating potential or actual damage to some body structure Inclusions: sensations of generalized or localized pain, in one or more body part, pain in a dermatome, stabbing pain, burning pain, dull pain, aching pain, impairments, such as myalgia, analgesia, and hyperalgesia. s730: Structure of upper extremity d450: Walking Moving along a surface on foot, step by step, so that one foot is always on the ground, such as when strolling, sauntering, walking forward, backward, or sideways Inclusions: walking short or long distances, walking on different surfaces, walking around obstacles Exclusions: transferring oneself (d420), moving around (d455) d920: Recreation and leisure Engaging in any form of play, recreational or leisure activity, such as informal or organized play and sports, programs of physical fitness, relaxation, amusement or diversion, going to art galleries, museums, cinemas or theaters; engaging in crafts or hobbies, reading for enjoyment, playing musical instruments; sightseeing, tourism, and traveling for pleasure Inclusions: play, sports, arts and culture, crafts, hobbies, and socializing Exclusions: riding animals for transportation (d480), remunerative and nonremunerative work (d850 and d855), religion and spirituality (d930), political life and citizenship (d950) e1101: Drugs Any natural or human-made object or substance gathered, processed, or manufactured for medicinal purposes, such as allopathic and naturopathic medication Abbreviation: ICF, International Classification of Functioning, Disability and Health. a The letter b refers to body functions; s, body structures; d, activities and participation domains; and e, environmental factors. Table A2 ICF qualifier with percentage values provided by WHO [8] ICF qualifier a Percentage of problem 0dNO problem (none, absent, negligible,.) 0e4 1dMILD problem (slight, low,.) 5e24 2dMODERATE problem (medium, fair,.) 25e49 3dSEVERE problem (high, extreme,.) 50e95 4dCOMPLETE problem (total,.) 96e100 Abbreviation: WHO, World Health Organization. a Having a problem may mean an impairment, limitation, restriction or barrier, depending on the construct [8, p. 222], i.e., depending on whether we are classifying body functions and structures (impairments), activity and participation (limitations or restrictions), or environmental factors (barriers or facilitators).
Table A3 Fit of individual items to the model before and after accounting for disordered thresholds and item misfit Items Before D R SE FR DF X2 DF P T1 T2 T3 T4 T5 D R SE FR DF X2 DF P T1 T2 T3 T4 T5 MFI 1: I feel fit. 0.56 7 0.11 0.74 111.33 3.49 2 0.17 1.9 0.6 0.9 1.7 0.29 7 0.12 0.33 110.27 1.92 2 0.38 2.0 0.7 0.9 1.8 MFI 3: I feel very active. 0.39 10 0.10 1.19 111.33 0.37 2 0.83 L0.7 L0.9 0.6 1.0 0.07 10 0.13 0.25 110.27 0.98 2 0.61 1.8 0.8 1.0 MFI 5: I feel tired. 0.33 11 0.10 1.33 111.33 0.67 2 0.71 0.7 0.6 0.2 1.4 0.08 9 0.10 1.52 110.27 1.97 2 0.37 0.8 0.6 0.2 1.5 MFI 12: I am rested. 0.84 5 0.11 1.85 111.33 2.14 2 0.34 1.1 0.8 0.4 1.5 0.62 5 0.11 1.47 110.27 1.33 2 0.51 1.3 0.7 0.5 1.5 MFI 2: Physically, I feel only 0.06 13 0.10 2.58 111.33 8.78 2 0.01 L1.4 0.3 0.2 1.0 0.34 12 0.13 1.35 110.27 4.16 2 0.12 1.4 0.3 1.8 able to do a little. MFI 8: Physically, I can take 0.42 9 0.11 1.69 110.41 5.81 2 0.05 1.2 0.5 0.2 1.4 0.15 8 0.11 1.73 109.36 4.71 2 0.09 1.2 0.5 0.1 1.6 on a lot. MFI 14: Physically, I feel I 0.47 8 0.11 1.35 111.33 3.60 2 0.17 L0.9 L1.0 0.9 1.0 am in bad condition. MFI 16: I tire easily. 0.84 6 0.10 1.04 110.41 1.32 2 0.52 0.6 0.5 0.1 1.0 0.61 6 0.11 1.19 109.36 0.73 2 0.69 0.7 0.4 0.0 1.1 MFI 20: Physically, I feel I 1.15 1 0.11 0.04 111.33 1.87 2 0.39 1.0 1.0 0.6 1.4 0.92 2 0.12 0.39 110.27 1.18 2 0.55 1.1 0.9 0.5 1.5 am in an excellent condition. CES-D 02: I did not feel like 3.77 19 0.19 0.81 110.41 5.91 2 0.05 2.4 1.4 3.8 eating; my appetite was poor. CES-D 07: I felt that 0.73 16 0.14 0.19 110.41 6.49 2 0.04 1.9 0.3 2.2 1.07 14 0.14 0.47 109.36 6.22 2 0.04 2.0 0.3 2.3 everything I did was an effort. CES-D 20: I could not get 0.82 17 0.14 1.47 109.49 3.05 2 0.22 1.7 0.5 2.2 1.16 15 0.14 2.16 108.45 0.92 2 0.63 1.8 0.5 2.3 going. SF-36 9e: Did you have a lot 0.11 12 0.11 0.34 109.49 1.97 2 0.37 2.7 1.0 0.2 1.1 2.4 0.19 11 0.11 0.91 108.45 1.94 2 0.38 2.9 1.0 0.1 1.1 2.6 of energy? SF-36 9g: Did you feel worn 0.56 14 0.11 0.73 108.57 0.61 2 0.74 2.7 1.3 0.0 1.3 2.7 0.89 13 0.11 1.52 107.54 2.15 2 0.34 2.8 1.4 0.1 1.4 2.9 out? SF-36 9i: Did you feel tired? 0.92 3 0.12 0.16 110.41 5.60 2 0.06 L3.1 L1.9 0.5 2.4 2.2 1.18 1 0.31 0.27 109.36 2.17 2 0.34 4.2 4.2 RAQoL 1: I have to go to bed 0.67 15 0.22 1.56 109.49 8.78 2 0.01 0.0 earlier than I would like to. RAQoL 25: It s too much 1.39 18 0.25 0.63 107.65 3.10 2 0.21 0.0 1.73 16 0.25 0.33 106.62 2.75 2 0.25 0.0 effort to go out and see people. RAQoL 10: I have to keep 0.90 4 0.22 1.19 108.57 4.11 2 0.13 0.0 0.64 4 0.23 0.92 107.54 3.13 2 0.21 0.0 stopping what I am doing, to rest. RAQoL 21: I feel tired 1.06 2 0.22 1.19 110.41 1.74 2 0.42 0.0 0.81 3 0.23 0.95 109.36 2.00 2 0.37 0.0 whatever I do. The items are presented according to the instrument to which they belong, as in Table 3. Items MFI 2, MFI 3, MFI 14, and SF-36 9i presented disordered-response options and were rescaled. The disordered thresholds are presented in bold. Abbreviations: D, estimate of item difficulty; R, rank order according to item-difficulty estimation; SE, standard error associated with item-difficulty estimation; FR, standardized fit residual z; c 2, chi square; DF, degrees of freedom; P, probability associated with c 2 ; T1, first threshold estimate; T2, second threshold estimate; T3, third threshold estimate, T4, fourth threshold estimate (there is only one threshold when the items are dichotomous). After 921.e3 A. Cieza et al. / Journal of Clinical Epidemiology 62 (2009) 912e921