Unidimensionality and reliability under Mokken scaling of the Dutch language version of the SF-36

Size: px
Start display at page:

Download "Unidimensionality and reliability under Mokken scaling of the Dutch language version of the SF-36"

Transcription

1 Quality of Life Research 12: , Ó 2003 Kluwer Academic Publishers. Printed in the Netherlands. 189 Unidimensionality and reliability under Mokken scaling of the Dutch language version of the SF-36 P.G.M. van der Heijden 1,3, S. van Buuren 1, M. Fekkes 2, J. Radder 1 & E. Verrips 2 1 Department of Statistics, TNO Prevention and Health, Leiden ( pgm.vanderheijden@pg.tno.nl); 2 Child Health Division, TNO Prevention and Health, Leiden; 3 Department of Methodology and Statistics, Utrecht University, The Netherlands Accepted in revised form 5 June 2002 Abstract The sub-scales of the SF-36 in the Dutch National Study are investigated with respect to unidimensionality and reliability. It is argued that these properties deserve separate treatment. For unidimensionality we use a non-parametric model from item response theory, called the Mokken scaling model, and compute the corresponding scalability coefficients. We estimate reliability under the Mokken model, assuming that the items are double homogeneous, and compare it to Cronbach s a. The scalability of the sub-scale general health perceptions is medium (H ¼ 0:46), and for the other sub-scales it is strong (HP0:6). The reliability in terms of a indicates that all sub-scales can be used in basic research (a > 0:70), butthatonly physical functioning can be used for clinical applications of quality of life (a > 0:90). The relative merits of our approach are discussed. Key words: Item response theory, Mokken scaling model, Reliability, Scalability, SF-36 Introduction Appropriate measurement is fundamental to the field of quality of life (QoL). Many measurement instruments exist, and one of the most popular is the SF-36. In this paper we discuss two important properties of the SF-36, unidimensionality and reliability, from a psychometric perspective. Our aim is to highlight the role of both concepts in the construction of QoL-instruments. Why are unidimensionality and reliability important concepts in the construction of QoL-instruments? QoL cannot be measured by devices based on laws of physics, like the sphygmomanometer for measuring blood pressure, but requires the use of one or more questionnaire items. Such items probe (the evaluation of) the health status of the respondent. Using only one item is ideal in the sense that the meaning of the answer will be more or less unambiguous, butatthe same time it is also very crude, that is, it has a large measurementerror. The measurementcan be made more precise (more reliable) by administering more items and by then aggregating the responses to a sum score. The downside of this is that this can only be sensibly done if the items measure the same thing, that is, if they are unidimensional. Reliability without unidimensionality is only of limited value since we do not know what we are measuring. Similarly, unidimensionality without reliability is also of limited value because we may be measuring too crudely. Thus, unidimensionality and reliability are different concepts. Both play an equally important role in the construction of QoL-instruments. Maximising either one separately may go at the expense of the other, and generally there should be a balance between them. We now discuss the concepts in more detail. A set of items is called unidimensional if they measure the same trait or property. This property is fundamental to almost any kind of measurement, and forms the key component of content

2 190 validity. Unidimensionality does not specify the measurement level of the trait. Despite its importance there is no accepted and effective index of unidimensionality. Most authors seem to agree with McDonald that a set of items is unidimensional if and only if the set fits a (generally nonlinear) common factor model with just one common factor [1]. A similar definition was given by Lord and Novick [2], who state that an item set is homogeneous if their true scores can be shown to be a monotonic function of some single variable. Note that this definition uses the term homogeneous. Hattie [3] provides an excellent discussion of the confusion that is being generated by different interpretations of terms like internal consistency and homogeneity. To circumventthese difficulties, we will use the term unidimensionality throughout. The reliability of a test is a value that quantifies the amount of measurement error of the total score of a test. Reliability is defined as the ratio of the true score variance to the observed score variance [4]. Under the assumptions that the items fit a linear common one-factor model and that the measurementerror variance is equal across all true scores, the popular Cronbach s a gives a sound lower bound to the reliability of a total test score [5]. A well-known limitation of Cronbach s a is the fact that its value depends on the distribution of the true scores of the population. Thus, the coefficient of the reference population can be inadequate for the population to which the test is applied. Latent trait models, by contrast, offer a standard error for each possible test score. An important problem is that the interpretation of Cronbach s a measure is sometimes extended beyond the use for which it is intended. Some researchers believe that Cronbach s a is a measure that can be used to assess the unidimensionality of a scale. This is fuelled by poorly defined concepts like homogeneity and internal consistency, and by popular procedure like SPSS RELIABILITY that support methods to select a set of items that maximises a. It is not always realised that this may only be sensibly done after unidimensionality of the candidate items has been established, presumably by some form of factor analysis. Green et al. [6] demonstrated that a can be high even if the underlying items are clearly multidimensional. Furthermore, it is known that a increases if the number of items in a test is made larger by defining parallel items. If a would really be a measure of unidimensionality, then a should notchange by including items that measure the same construct. The existence of some other problems with a leads Hattie to conclude that despite its common usage as an index of dimensionality, a is extremely suspect [3, p. 145]. Other methods than a must be used to study the unidimensionality of a set of items. There are several alternatives that can be classified in three broad groups: linear common factor analysis, parametric item response models and non-parametric item response models. In common factor analysis, one would be interested in selecting a subset of items that fits the linear common one-factor model. Fit statistics that measure the difference between the observed and the expected covariance matrix can be used to diagnose model fit. Alternatively, one could inspect the off-diagonal elements of residual covariance matrix for aberrant extreme values (see Hambleton et al. [7, p. 56] for other methods). Though easy to apply, the linear model is however not particularly suited to create QoL-instruments as it requires item scores to be measured on an interval scale. In addition, the relation between the latent trait and the observed scores are assumed to be linear. Both assumptions are not necessarily true in QoL research, and if the assumptions are not met, then factor analysis can come up with too many factors differentiating between items that in fact measure the same thing [8]. As an alternative, psychometricians developed a large variety of non-linear parametric one-factor models. Such models are known under names as the Rasch model (RM) the two- and three-parameter models (for binary data), the rating scale model, the graded response model, the partial credit model (for polytomous data). Common to these models is that they specify the probability of a particular category by means of logistic or normal ogive functions of the latent trait. The primary advantage is that responses in individual categories are fitted, thus obviating the need for interval measures. For some models, statistical tests have been developed to test the fit to the data [9, 10]. Such tests can be regarded as tests for unidimensionality, butare notwidely used. A practical problem is that those tests are rather stringent. The

3 191 RM tends to be rejected for many real QoL data. For example, Raczek etal. [11] conducted a Rasch analysis on the 10 items of the PF sub-scale (PF- 10), using data from seven countries. They conclude that unidimensionality of the PF scale is upheld (p. 1201), but their Table 3 indicates that both items bending/kneeling and bathing/dressing do not fit the RM for all seven countries, and that item vigorous activities does not fit in five countries. Other evidence [12] suggests that the PF- 10 does notform a perfecthierarchy on a unidimensional scale. If it is so difficult to fit the RM on even the best sub-scale of the SF-36, then more problems can be expected for the other sub-scales. Within the framework of the RM possible solutions to this problem of misfit include: (1) reduce the item set to a more unidimensional subset, or (2) introduce additional flexibility by increasing the number of parameters of the item characteristic curve (ICC), as in the two- and three-parameter models. There is however no guarantee that such solutions work, so the problem may persist. Consequently, applying less stringent models on QoL data could be useful. A third group of methods, the non-parametric item response model, is such a less stringent model that makes minimal assumptions. This group does not require a parametric functional form for the ICC, but allows for any monotonously increasing function. This approach is known as Mokken scaling [13, 14]. Since the shape of the ICC is much more flexible, Mokken scaling can fit data that will fail to fit parametric models. Mokken scaling can thus be used as a diagnostic tool to discover the shape of the ICC that is dictated by the data, thereby suggesting instances where the more parsimonious parametric models will inappropriately smooth out important characteristics of the data. More importantly, the technique may save items that are notconsidered unidimensional under a parametric model, but that do fit the Mokken model. This could help to keep the item pool as large as possible, and thus save valuable work. Since the modelling assumptions of Mokken scaling are minimal, we believe that it exemplifies the purest form of unidimensionality. Both unidimensionality and reliability are important issues for the assessment of usefulness of sub-scales of the SF-36. Therefore the paper has two purposes: (1) to evaluate the unidimensionality of the items in the sub-scales of Dutch language version of the SF-36 using the Mokken scaling model, and (2) to compare the results with the reliability as traditionally estimated by Cronbach s a. Method Data We evaluate the unidimensionality and reliability of the sub-scales of the SF-36 for the so-called Dutch National Study [15]. We chose the SF-36 because this instrument is well established. Our method can be applied to any QoL-instrument, and our results on the SF-36 can act as a benchmark. The National Study is a nation-wide random sample of adults from the Netherlands (n ¼ 1742). The data were collected in Ref. [15] provides descriptive statistics on the item and scale level. Overall the data quality in this study is high if evaluated in terms of percentage of missing data, multi-trait scaling structure and reliability of the sub-scales. Known-group comparisons yielded consistent support for the validity of the SF-36. If individuals have one or more items missing on a sub-scale, they are omitted from this sub-scale (listwise deletion of missing data). The percentage of respondents having one or more items missing was relatively small (ranging between 0 and 4%). Mokken scaling model We apply Mokken scaling to each sub-scale of the SF-36 to order the subjects as well as (the levels of) the items in terms of increasing health. Sijtsma [16] discusses many similarities and differences between the Mokken scaling model and the RM. Like the RM, the Mokken scaling model is a probabilistic version of the deterministic Guttman scale. The Guttman model forms the basis of many fit statistics in both the Mokken and Rasch models. In the sequel, we use the Mokken scaling model for polytomous items [17]. We now discuss the assumptions made in Mokken scaling in more detail. Consider a set of dichotomous health items that all aim to measure some quantitative latent trait, for example perceived health. Assume that each health item has two answers, yes (positive about some particular

4 192 health status) and no (negative). The Mokken model has three assumptions: (1) the latent trait is unidimensional, (2) given the level of the latent trait, the answers on the items are independent, and (3) the ICC of an item is a non-decreasing function. In Figure 1a a non-decreasing line (the ICC) describes the relation between the probability of a positive answer as a function of the latent trait. Figure 1b shows the ICCs of two items. It shows that lines for different items can be different in form and location. Because of the independence between the answers on the two items given the level of the latent trait, the latent trait explains the observed relation between the items. If these three assumptions hold, a set of items is called a monotonously homogeneous (MH) set. If MH holds, the sum score of the items order the individuals on the latent trait (see Ref. [16]). This has an important practical implication: If the MH assumption holds, the latent trait is a monotone function of the sum score. It is possible to add a fourth assumption leading to a stronger Mokken model: (4) the ICCs may not intersect (see Figure 1c). If this assumption holds, the individuals also order the items, and the proportion of correctresponses on a particular sample can be used to estimate the location of the item. We then speak of a doubly monotone (DM) set. The ordering of items is identical across the entire latent scale under the DM model, so the estimated item ordering will be independent of the particular population. The Mokken scaling model for polytomous items Thus far we discussed the Mokken scaling model for dichotomous items. However, many items of sub-scales of the SF-36 are polytomous. For polytomous items the Mokken model has been generalised by Molenaar [17]. Assume that each item has m þ 1 ordered categories. An item step is an imaginary threshold between two adjacent categories of an item. Within an item, item steps are dependent, because if, for example, the step from category 1 to category 2 is not taken, then the step from 2 to 3 cannot be taken. By using the item steps it is possible to construct item step characteristic curves (ISCC s) describing the relation between the latent trait and the probability that an item step is taken. By definition ISCC s of one item cannot intersect. The MH assumption of the polytomous Mokken scaling model is that ISCC s are monotone non-decreasing functions. The DM assumption is that ISCC s of different items are not allowed to intersect. Evaluation tools exist to check whether a set of data has MH and DM properties (see Ref. [17, 19]). All analyses are performed with the computer program MSP 3.0 [19]. Check of model assumptions Figure 1. (a)example of a monotonoulsy increasing ICC describing the relation between the latent trait and the probability of a positive answer. (b) Example of two monotonously increasing ICCs being a MH set. (c) Example of two monotonously increasing ICCs being a DM set. The Mokken scaling model is a non-parametric model. To check whether the model fits the data, i.e. whether the assumptions underlying the Mokken model hold, certain characteristics of the data should be checked. The Mokken scaling model as itis implemented in MSP 3.0 [18] comes with a fairly elaborate set of diagnostic methods to assess the assumptions of the model. The scalability and the single method evaluate the MH properties, while the restscore and pmatrix methods evaluate DM properties. It is outside the

5 193 scope of this paper to describe each of these methods in detail, and here we will only give a global indication of how the model properties are checked. We refer to the literature for a thorough description of each of these [13, 18, 19]. The so-called Guttman scalogram model plays a key role in checking the MH model via the scalability. Figure 2 illustrates the Guttman model. There are two items, where the item on the left is item g and the item on the right is item h. For each item one ICC is plotted, and for each of the items the probability of answering yes is either zero or one. In this way there are only three admissible response patterns to the item pair, namely, from leftto right(no, no), (yes, no) and (yes, yes). An answer (no, yes) is an observed error in terms of the Guttman scalogram model. Thus in the scalogram model some of the response patterns are notallowed. In the Mokken scaling model all response patterns are in principle allowed, but some have a lower probability of occurrence than others. To assess the fit of the Mokken scaling model, we measure the closeness of the solution to the perfect Guttman scale. It is possible to evaluate the scalability of each of the individual items as well as of the whole set of items using a coefficient known as Loevinger s H [20]. If the ICCs of a set of items are non-decreasing, then 0 O H O 1. Itis equal to 1 if the items form a perfect Guttman scale, and it is equal to 0 if and only if all (or all minus one) items are constant functions of the latent trait [11]. Coefficient H can be interpreted as an index for the degree to which subjects can be accurately ordered by means of k items. Mokken [13, 14] speaks of weak scalability of a set of items when 0:3 O H < 0:4, of medium scalability when 0:4 O H < 0:5, and of strong scalability when 0:5 O HO 1:0. A set of items is unscalable if H < 0:3. These guidelines for interpreting H seem coarse, butwe could not find other guidelines that allow finer gradations and interpretations of scalability. The single method is the second method to check MH. The starting point is that, if MH holds, then a person falling in a higher level of an item will in general also fall in a higher level on the latent trait [17, 19]. In order to investigate this, certain aspects of the data can be checked. The total number of checks performed is denoted by the MSP-program as #ap. In each check it may turn out that the data do not follow MH. Violations are counted and referred to as #vi. Each violation separately can be tested for significance. However, such tests should be approached reluctantly and with caution, because, even if in a population the MH model would hold, by chance alone we would expect to find some violations, and if the number of pairs compared is large we even expect to find some of them to be significant. Both methods to check DM compare, each in a different way, data on each of the possible pairs of items with data on the remaining items. Therefore it is only possible to use these methods when the number of items in a sub-scale is larger than two. The so-called rest score method is the first method to check DM. It checks whether two ISCC s of different items do not intersect. Again, in order to investigate this, certain aspects of the data can be checked. The total number of checks performed is denoted by the MSP-program as #rg. In each check it may turn out that the data do not follow DM. Here, violations are also counted and referred to as #vi. Again, each violation separately can be tested for significance, but such tests should be handled with care. The so-called P matrix method is the second method to check DM. It also concerns checking aspects in the data. The number of checks is denoted #ac, and the number of violations by #vi. Again, each violation separately can be tested for significance and interpreted cautiously. Reliability measures Figure 2. Example of two ICCs for items following a Guttman scale. Cronbach s a is computed using the computer program SPSS. For dichotomous items a reliability measure based on Mokken s scaling model has been discussed in Refs. [16, 20] (for a generalization to polytomous items, see Ref. [17]). This reliability measure assumes that the DM assumption

6 194 holds. This reliability measure is also provided by computer program MSP. How large should a be? In psychometrics [21] as well as in medical statistics [22] values of 0.70 or 0.80 are considered to be sufficient for basic research, such as comparing groups and calculating correlations. However, for the clinical application (decisions about individuals) a s as high as 0.90 or 0.95 are required [17, 18]. Results Table 1 summarises the results of the evaluation of the Mokken scaling. We start with a discussion of MH properties, i.e. whether the ISCC s are monotonously increasing. These are investigated by H and the single method. Using the guidelines for the interpretation of H (i.e. 0:4 O H < 0:5 is medium scalability and 0:5 O H < 1:0 is strong scalability), we find in column 1 that the fit of the MH model of all sub-scales is strong, except for General Health Perceptions where the fit of the MH model is medium. Another way to check MH properties is by using the single method, which assesses whether the response proportion per category increases with ability level [17, 19]. In the two columns with the heading single, the one with label #ap gives the number of checks made in the data, and the column with the label #vi gives the number of times this a violation of MH is found. It appears that no scales have violations, so there is strong evidence that all scales are monotonously increasing. This means that for each of the scales the persons can be ordered by the items of the scales. For the DM properties of the data, there are two checks to see whether the ISCC s belonging to different items do not intersect, the P matrix and rest score methods. Since it is only possible to use these methods when the number of items in a sub-scale is larger than two, for the sub-scales Bodily Pain (BP) and Social Functioning (SF) we have no results. In the columns with heading P matrix, the one labelled Ô#acÕ shows the number of checks made in the data, and the next column labelled #vi gives the number of violations of DM found in the data. For Role Limitations due to Physical Health problems (RP) and for Role Limitations due to Emotional Problems (RE) the number of active comparisons made is small, butthey do notshow any violation of DM. For the other scales some violations are found, but considering the large numbers of active comparisons made, the number of violations found is small (for example, for Physical Functioning (PF) 2880 active comparisons are made and only six show a violation of the model; such a small number could be due to chance, even if the model were true). However, we will study them on item level when we discuss Table 2. Table 1. Overview of scales Scale H Single P matrix Rest score Reliability a #ap #vi #ac #vi #rg #vi PF RP BP GH VT SF RE MH Scales are Physical Functioning (PF, n ¼ 1675), Role limitations due to Physical Health problems (RP, n ¼ 1703), Bodily Pain (BP, n ¼ 1735), General Health Perceptions (GH, n ¼ 1664), Vitality (VT, n ¼ 1723), Social Functioning (SF, n ¼ 1738), Role Limitations due to Emotional Problems (RE, n ¼ 1708) and General Mental Health (MHe, n ¼ 1705). MH can be checked by scalability H and single method, DM by P matrix and rest score. In single method, #ap is the number of active pairs of items where MH properties of the data are checked, and #vi is the number of violations of MH found in these active pairs. In P matrix method, #ac is the number of active comparisons carried out to check the DM properties of the data, and #vi is the number of violations of DM found in these comparisons. In rest score method #rg is the number of rest group comparisons carried out to check the DM properties of the data, and #vi is the number of violations of DM found in these comparisons.

7 195 For the rest score method the number of checks made is found in the column labelled #rg, and this column is followed by the column with the number of violations of DM, #vi. Again, RP is doing well. RE is doing a worse, butadditional information in the output shows that this violation is only small: the z-value for the violation between the first and third item is The other sub-scales also show violations, but considering the number of rest group comparisons carried out, the number of violations in the data is small. We get more insight into these violations when we discuss Table 2. Table 2 shows the same results at the item level. When we consider MH, we find again that the Table 2. Mokken scale properties and violations of individual SF-36 items Scale Item H Single P matrix Rest score #ap #vi #ac #vi #rg #vi PF Vigorous activities Moderate activities Lifting/carrying Climbing several Climbing one Bending/kneeling Walking > 1mile Walking blocks Walking block Bathing/dressing RP Amountof time Accomplishes less Kind of work Difficulty BP Had BP Did pain interfere GH General health is Sick easier th. oth As healthy as Expectgetworse Is excellent VT Full of pep Lotof Energy Worn out Feel tired SF Normal social activ Social activities RE Amountof time Accomplished less Notas carefully MHe Nervous person Nothing cheers up Calm and peaceful Downhearted, blue Happy person MH can be checked by scalability H and single method, DM by P matrix and rest score. In single method, #ap is the number of active pairs of items where MH properties of the data are checked, and #vi is the number of violations of MH found in these active pairs. In P matrix method, #ac is the number of active comparisons carried out to check the DM properties of the data, and #vi is the number of violations of DM found in these comparisons. In rest score method #rg is the number of rest group comparisons carried out to check the DM properties of the data, and #vi is the number of violations of DM found in these comparisons.

8 196 sub-scales contain no items that have a much lower H-value than the other items, except perhaps for the removal of item Did you feel full of pep? from the sub-scale Vitality and Have you been a very nervous person? from General Mental Health (MHe). Butremovals are notnecessary since H for the total sub-scale is larger than 0.5, and this is to be interpreted as good MH properties. For DM properties this is a bit more complicated, because violations are always related to item step pairs. It follows that many of the violations would disappear if we remove the following items: item I am as healthy as anybody I know from General Health Perceptions (11 out of the 12 rest score violations are significant), item Did you feel full of pep? from Vitality (9 out of the 10 rest score violations are significant), and item Have you been a very nervous person? from MHe (3 out of the 5 rest score violations are significant). A close inspection of the violations of these three items reveals that, for the item Did you feel full of pep? as well as for the item Have you been a very nervous person? most violations are found for the item steps from category A good bit of the time to Some of the time and from category Some of the time to A little of the time. For the item I am as healthy as anybody I know the violations are found in all item steps. However, it seems better to keep those variables in the sub-scales, because the number of violations is relatively small compared to the very large number of comparisons made. For future research into different populations it deserves attention that these items threaten the DM properties of these sub-scales, and it is of interest to see if the same violations will be found. We now discuss reliability measures for the subscales (see the last two columns of Table 1). Cronbach s a is interpreted as a lower bound for reliability, and it is clear that the lower bounds are quite high, in any case high enough to use each of the scales for basic research purposes in a population comparable to the one we have in this study. The a for PF indicates that this scale can be used for clinical applications, and the a s of the scales RP, BP and MH are close to the range where this usage is allowed. The reliability measure derived under the assumption that the items form a DM set are very close to the estimates of Cronbach s a estimates. Since DM holds, the first reliability measure is preferable because Cronbach s a gives a lower bound and depends on untested linearity assumptions. Note also that the reliability estimates vary independently from the scalability coefficients. Conclusion The results make clear that the reliability of the scales is good enough to use all scales in basic research, being around between 0.78 and 0.95, with the lowest reliability for perceived general health. Only PF can be used in clinical applications in populations similar to the one used in this study (a is larger than 0.90), and some of the other scales are very close to this situation. The scalability of the items is very good: monotone homogeneity is never violated for any of the sub-scales, and H-coefficients are almost always larger than 0.7, with again an exception for Perceived General Health, were H ¼ 0:46, and Vitality (H ¼ 0:60) and MH (H ¼ 0:61). Double monotonicity is also good to very good: there is only a very limited number of violations given the enormous amountof comparisons thatare made, and it is not unlikely that these violations are due to sample fluctuations. We conclude that the scalability of the items of the sub-scales assessed with the Mokken scaling model is good, both in terms of MH and DM, and no important gains will result by omitting any of the items in the sub-scales. We conclude that besides good reliability the SF- 36 sub-scales also show good scalability/unidimensionality. These results will hold in populations with a health distribution similar to the one in the National study, but might differ somewhat from populations with different health distributions. For example, there appear to be some ceiling effects in the data. In a population where health is worse, such ceiling effects will diminish. Since ceiling effects do not allow for violations of the Mokken scale properties, it might be that the Mokken scale properties of the SF-36 are worse in a population with a worse health. This remains a topic of further study. Discussion Unidimensionality and reliability are different concepts. Unidimensionality refers to the question

9 197 whether items measure the same thing, whereas reliability indicates the variation of the test results if it is repeated under comparable circumstances. In scale construction both aspects need to be assessed, but the limitations of the traditional approach based on common factor analysis and Cronbach s a are well known. Several authors have warned against the inappropriate interpretation of Cronbach s a as a measure for unidimensionality [1, 3, 6]. The present paper investigates a particular alternative, the Mokken scaling model, for assessing unidimensionality and reliability. The Mokken scaling model tries to get the most out the set of the weakest possible assumptions. It is the general item response model for ordering persons according to a sum score. The model can be used to discover the shape of the ICC, and may save items that are not unidimensional according to parametric models. Since the modelling assumptions of Mokken scaling are minimal, it is in some sense the purest form of unidimensionality. The Mokken approach determines the unidimensionality of a set of items in terms of their scalability. Mokken [13] suggested a number of cut-off values as criteria for scalability. Though such guidance it useful, the exact choice of these values remains of course somewhatarbitrary. Hattie [3, p. 143] argued that scalability is a limited operationalisation of unidimensionality. A serious objection is that these methods can only achieve their upper bounds if the strong assumption of scalability (i.e. a perfect scale) is made. Another criticism is that there is nothing in the methods that enables a test of just one trait to be distinguished from a test composed of an equally weighted composite [3, p. 143]. This criticism is related to the old-standing debate about general vs. specific ability factors. For these and other reasons, Hattie [3] prefers to use the principle of local stochastic independence from a one-factor model as the best way to define unidimensionality. Local independence is one of the key assumptions in Mokken scaling. Some work has been done to actually test the unidimensionality assumption in non-parametric models [23, 24]. This approach has not(yet) been implemented in MSP, butitcould potentially be useful in removing some of the arbitrariness of the assessment of unidimensionality. Similar remarks can be made aboutthe number of violations and the number of active comparison that are needed for testing the DM assumption. For most of the scales, the number of violation is small relative to the number of active comparisons. There is however no statistical test to accompany this statement, and the conclusion remains therefore somewhatarbitrary. More work in this area could be useful. Within the family of IRT models, the RM is best known. The RM is special in the sense that it possesses statistical properties like sufficiency, separability and specific objectivity [25]. It has been argued that the model can therefore be viewed as a fundamental model of measurement. This implies that if the data fit the model, then the measurements have interval scale properties. Now how does this compare to the non-parametric Mokken model? Both the Mokken and Rasch model are generalisations of the Guttman scaling model, butthe Mokken scale makes weaker assumptions than the RM. The Mokken model orders items and persons, and thus produces a scale that has ordinal scale properties at most. Though the Mokken model is more general and will fit the data more often than the RM, this comes at the price that the property of invariant comparison is lost. This is probably not so much of a problem if the test is primarily used for diagnostic purposes, where a specific subsample beyond the cut-off value is referred for further evaluation. The interpretation of difference scores is however complicated by the fact that intervals between different points of an ordinal scale cannot be formally compared. In such a case, the Mokken model can still be useful though to identify the shape of ICC s that are to be fitted within a parametric framework. Last, we also note that the DM property of the data cannot be investigated in the Mokken model, whereas there is no problem to assess the fit of the RM in this case. If the DM assumption fails, the difficulty order of the items is different for different people. What does that mean for comparing scale scores between people? The most important consequence is that the appropriateness of using the sum score is not theoretically warranted in the polytomous case. If all items are binary, we may validly use the sum score to compare people, as the ranks of the sum score and the latent ability are identical (except for ties). This property holds under both HM and DM models, but it is not true for polytomous items

10 198 [26]. In the latter case, we can use the ability estimates provided by the Mokken model as an alternative, but this is less convenient than computing a sum score. Sijtsma and van der Ark conducted a small simulation study about the adequacy of using the sum score as a proxy for the latent ability order. This was done for the polytomous HM model, that is, the model where the largest discrepancies can be expected. The results tentatively suggest that, in practice, the use of the sum score does notlead to serious errors when ordering respondents on the latent ability [26, p. 309]. So, irrespective of the whether the DM property holds, the advice is to use to sum score for ranking respondents on the latent trait. In conclusion, Cronbach s a is nota measure of unidimensionality, and other methods are needed to assess unidimensionality of the set of items. This seems especially relevantfor a maturing field like QoL, where the concepts are still being defined and refined. We hope that the methods used in this paper will stimulate such work. References 1. McDonald RP. The dimensionality of tests and items. Brit J Math Stat Psychol 1981; 34: Lord FM, Novick MR. Statistical Theories of Mental Test Scores. Reading, Massachusetts: Addison-Wesley, Hattie J. Methodology review: Assessing unidimensionality of tests and items. Appl Psych Meas 1985; 9: Novick MR, Lewis C. Coefficient a and the reliability of composite measurements. Psychometrika 1967; 32: Cronbach LJ. Coefficient a and the internal structure of tests. Psychometrika 1951; 16: Green SB, Lissitz RW, Mulaik SA. Limitations of coefficient a as an index of test unidimensionality. Educ Psychol Meas 1977; 37: Hambleton RK, Swaminathan H, Rogers HJ. Fundamental of Item Response Theory. Newbury Park: Sage, McDonald RP, Ahlawat KS. Difficulty factors in binary data. Brit J Math Stat Psychol 1974; 27: Glas CAW, Verhelst ND. Testing the Rasch model. In: Fischer GH, Molenaar, IW (eds), Rasch Models, Berlin: Springer Verlag, 1995; Glas CAW, Verhelst ND. Tests of fit for polytomous Rasch models. In: Fischer GH, Molenaar IW (eds), Rasch Models, Berlin: Springer Verlag, 1995; Raczek AE, Ware JE, Bjorner JB, etal. Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: Results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51(11): Jenkinson C, Fitzpatrick R, Garratt A, Peto V, Stewart- Brown S. Can item response theory reduce patient burden when measuring health status in neurological disorders? Results from Rasch analysis of the SF-36 physical functioning scale (PF-10). J Neurol Neurosur Psychiat 2001; 71(2): Mokken RJ. Theory and Procedure of Scale Analysis. Den Haag: Mouton; Berlin: De Gruijter, Mokken RJ. Nonparametric models for dichotomous responses. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory, New York: Springer; 1997; Aaronson NK, Muller M, Cohen PDA, etal. Translation, validation and norming of the Dutch language version of the SF-36 health survey in community and Chronic disease populations. J Clin Epidemiol 1998; 51: Sijtsma K. Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Appl Psychol Meas 1998; 22: Molenaar IW. Nonparametric models for polytomous responses. In: van der Linden WJ, Hambleton RK (eds), Handbook of Modern Item Response Theory, New York: Springer, 1997; Mokken RJ, Lewis C. A nonparametric approach to the analysis of dichotomous item responses. Appl Psychol Meas 1982; 6: Molenaar IW, Debets P, Sijtsma K, Hemker BT. MSP: A Program for Mokken Scale Analysis for Polytomous Items. Version 3.0. Groningen: iecprogam-ma, Meijer RR, Sijtsma K, Molenaar IW. Reliability estimation for single dichotomous tems based on Mokken s IRT model. Appl Psychol Meas 1995; 14: Nunnally JC, Bernstein IH. Psychometric Theory. 3rd ed. New York: McGraw Hill, Bland JM, Altman DG. Cronbach s a (Statistics notes). BritMed J 1997; 314: Stout W. A nonparametric approach for assessing latent trait unidimensionality. Psychometrika 1987; 52: Stout W, Goodwin Froelich A, Gao F. Using resampling methods to produce an improved DIMTEST procedure. In: Boomsma A, Van Duijn MAJ, Snijders TAB (eds), Essays of Item Response Theory, Berlin: Springer Verlag, 2001; RostJ. The growing family of Rasch models. In: Boomsma A, Van Duijn MAJ, Snijders TAB (eds), Essays of Item Response Theory, Berlin: Springer Verlag, 2001; Sijtsma K, van der Ark LA. Progress in NIRT analysis of polytomous item scores: Dilemmas and practical solutions. In: Boomsma A, Van Duijn MAJ, Snijders TAB (eds), Essays of Item Response Theory, Berlin: Springer Verlag, 2001; Address for correspondence: P.G.M. van der Heijden, Department of Statistics, TNO Prevention and Health, Postbus 2215, 2301 CE Leiden, The Netherlands Phone: ; Fax: pgm.vanderheijden@pg.tno.nl

Practical Mokken scale analysis 1. Running head: Practical Mokken scale analysis

Practical Mokken scale analysis 1. Running head: Practical Mokken scale analysis Practical Mokken scale analysis 1 Running head: Practical Mokken scale analysis Mokken Scale Analysis of Mental Health and Well-being Questionnaire item responses: a Non-Parametric IRT Method in Empirical

More information

The RAND 36-Item Health Survey

The RAND 36-Item Health Survey The RAND 36-Item Health Survey Introduction The RAND 36-Item Health Survey (Version 1.0) laps eight concepts: physical functioning, bodily pain, role limitations due to physical health problems, role limitations

More information

Medical Outcomes Study Questionnaire Short Form 36 Health Survey (SF-36)

Medical Outcomes Study Questionnaire Short Form 36 Health Survey (SF-36) Medical Outcomes Study Questionnaire Short Form 36 Health Survey (SF-36) bout: The SF-36 is an indicator overall health status. Items: 10 Reliability: Most se studies that examined reliability SF_36 have

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

CRITÉRIOS DE AVALIAÇÃO DE QUALIDADE DE VIDA

CRITÉRIOS DE AVALIAÇÃO DE QUALIDADE DE VIDA CRITÉRIOS DE AVALIAÇÃO DE QUALIDADE DE VIDA S1 SF36 General Health Survey Patient Name Patient ID Study Name Study Number Date / / Side Right Left Filled in by: Operating Dr. Other MD Research Assistant

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

Internal Consistency: Do We Really Know What It Is and How to Assess It?

Internal Consistency: Do We Really Know What It Is and How to Assess It? Journal of Psychology and Behavioral Science June 2014, Vol. 2, No. 2, pp. 205-220 ISSN: 2374-2380 (Print) 2374-2399 (Online) Copyright The Author(s). 2014. All Rights Reserved. Published by American Research

More information

Richard E. Zinbarg northwestern university, the family institute at northwestern university. William Revelle northwestern university

Richard E. Zinbarg northwestern university, the family institute at northwestern university. William Revelle northwestern university psychometrika vol. 70, no., 23 33 march 2005 DOI: 0.007/s336-003-0974-7 CRONBACH S α, REVELLE S β, AND MCDONALD S ω H : THEIR RELATIONS WITH EACH OTHER AND TWO ALTERNATIVE CONCEPTUALIZATIONS OF RELIABILITY

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Application of a Psychometric Rating Model to

Application of a Psychometric Rating Model to Application of a Psychometric Rating Model to Ordered Categories Which Are Scored with Successive Integers David Andrich The University of Western Australia A latent trait measurement model in which ordered

More information

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores Wayne J. Camara, Dongmei Li, Deborah J. Harris, Benjamin Andrews, Qing Yi, and Yong He ACT Research Explains

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Item Imputation Without Specifying Scale Structure

Item Imputation Without Specifying Scale Structure Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete

More information

Chapter 3 Local Marketing in Practice

Chapter 3 Local Marketing in Practice Chapter 3 Local Marketing in Practice 3.1 Introduction In this chapter, we examine how local marketing is applied in Dutch supermarkets. We describe the research design in Section 3.1 and present the results

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

A Basic Guide to Analyzing Individual Scores Data with SPSS

A Basic Guide to Analyzing Individual Scores Data with SPSS A Basic Guide to Analyzing Individual Scores Data with SPSS Step 1. Clean the data file Open the Excel file with your data. You may get the following message: If you get this message, click yes. Delete

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES

PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES BY SEEMA VARMA, PH.D. EDUCATIONAL DATA SYSTEMS, INC. 15850 CONCORD CIRCLE, SUITE A MORGAN HILL CA 95037 WWW.EDDATA.COM Overview

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Test Reliability Indicates More than Just Consistency

Test Reliability Indicates More than Just Consistency Assessment Brief 015.03 Test Indicates More than Just Consistency by Dr. Timothy Vansickle April 015 Introduction is the extent to which an experiment, test, or measuring procedure yields the same results

More information

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss

More information

Application of discriminant analysis to predict the class of degree for graduating students in a university system

Application of discriminant analysis to predict the class of degree for graduating students in a university system International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

More information

College Readiness LINKING STUDY

College Readiness LINKING STUDY College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

More information

How to report the percentage of explained common variance in exploratory factor analysis

How to report the percentage of explained common variance in exploratory factor analysis UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Research Report 2013-3. Rating Quality Studies Using Rasch Measurement Theory. By George Engelhard, Jr. and Stefanie A. Wind.

Research Report 2013-3. Rating Quality Studies Using Rasch Measurement Theory. By George Engelhard, Jr. and Stefanie A. Wind. Research Report 2013-3 Rating Quality Studies Using Rasch Measurement Theory By George Engelhard, Jr. and Stefanie A. Wind easurement George Engelhard Jr. is a professor of educational measurement and

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Drawing Inferences about Instructors: The Inter-Class Reliability of Student Ratings of Instruction

Drawing Inferences about Instructors: The Inter-Class Reliability of Student Ratings of Instruction OEA Report 00-02 Drawing Inferences about Instructors: The Inter-Class Reliability of Student Ratings of Instruction Gerald M. Gillmore February, 2000 OVERVIEW The question addressed in this report is

More information

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously

More information

A C T R esearcli R e p o rt S eries 2 0 0 5. Using ACT Assessment Scores to Set Benchmarks for College Readiness. IJeff Allen.

A C T R esearcli R e p o rt S eries 2 0 0 5. Using ACT Assessment Scores to Set Benchmarks for College Readiness. IJeff Allen. A C T R esearcli R e p o rt S eries 2 0 0 5 Using ACT Assessment Scores to Set Benchmarks for College Readiness IJeff Allen Jim Sconing ACT August 2005 For additional copies write: ACT Research Report

More information

Summary chapter 1 chapter 2 147

Summary chapter 1 chapter 2 147 146 There is an increasing need for questionnaires to assess the health-related quality of life of Turkish and Moroccan cancer patients in the Netherlands (chapter 1). The first generation of Turkish and

More information

Exploring Graduates Perceptions of the Quality of Higher Education

Exploring Graduates Perceptions of the Quality of Higher Education Exploring Graduates Perceptions of the Quality of Higher Education Adee Athiyainan and Bernie O Donnell Abstract Over the last decade, higher education institutions in Australia have become increasingly

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

More information

Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It

Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It Paper SAS364-2014 Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It Xinming An and Yiu-Fai Yung, SAS Institute Inc. ABSTRACT Item response theory (IRT) is concerned with

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Burnout and work engagement: Independent factors or opposite poles?

Burnout and work engagement: Independent factors or opposite poles? Journal of Vocational Behavior 68 (2006) 165 174 www.elsevier.com/locate/jvb Burnout and work engagement: Independent factors or opposite poles? Vicente González-Romá a, *, Wilmar B. Schaufeli b, Arnold

More information

Quality of Life. Questionnaire 3. 4 weeks after randomisation. Graag in laten vullen door geincludeerde patiënt METEX studie

Quality of Life. Questionnaire 3. 4 weeks after randomisation. Graag in laten vullen door geincludeerde patiënt METEX studie Patient registration label Quality of Life Questionnaire 3 4 weeks after randomisation Graag in laten vullen door geincludeerde patiënt Patient Identification Number Datum van invullen 1 SF-36 HEALTH SURVEY

More information

Terminating Sequential Delphi Survey Data Collection

Terminating Sequential Delphi Survey Data Collection A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 13 and Accuracy Under the Compound Multinomial Model Won-Chan Lee November 2005 Revised April 2007 Revised April 2008

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Comparison of Test Administration Procedures for Placement Decisions in a Mathematics Course

Comparison of Test Administration Procedures for Placement Decisions in a Mathematics Course Measurement and Research Department Reports 97-4 Comparison of Test Administration Procedures for Placement Decisions in a Mathematics Course G.J.J.M. Straetmans T.J.H.M. Eggen Measurement and Research

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Impact / Performance Matrix A Strategic Planning Tool

Impact / Performance Matrix A Strategic Planning Tool Impact / Performance Matrix A Strategic Planning Tool Larry J. Seibert, Ph.D. When Board members and staff convene for strategic planning sessions, there are a number of questions that typically need to

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

A credibility method for profitable cross-selling of insurance products

A credibility method for profitable cross-selling of insurance products Submitted to Annals of Actuarial Science manuscript 2 A credibility method for profitable cross-selling of insurance products Fredrik Thuring Faculty of Actuarial Science and Insurance, Cass Business School,

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) 1 Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) I. Authors should report effect sizes in the manuscript and tables when reporting

More information

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Latent Class Regression Part II

Latent Class Regression Part II This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

How To Study The Academic Performance Of An Mba

How To Study The Academic Performance Of An Mba Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 WORK EXPERIENCE: DETERMINANT OF MBA ACADEMIC SUCCESS? Andrew Braunstein, Iona College Hagan School of Business,

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

Impact of adhesive capsulitis on quality of life in elderly subjects with diabetes: A cross sectional study

Impact of adhesive capsulitis on quality of life in elderly subjects with diabetes: A cross sectional study Original Article Impact of adhesive capsulitis on quality of life in elderly subjects with diabetes: A cross sectional study Saumen Gupta, Kavitha Raja, Manikandan N Department of Physical Therapy, Manipal

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

How To Rate A Subordinate

How To Rate A Subordinate INTERNATIONAL JOURNAL OF SCHOLARLY ACADEMIC INTELLECTUAL DIVERSITY VOLUME 14, NUMBER 1, 2012 Performance Appraisal: Methods and Rating Errors Fred C. Lunenburg Sam Houston State University ABSTRACT Performance

More information

Lies, damned lies and latent classes: Can factor mixture models allow us to identify the latent structure of common mental disorders?

Lies, damned lies and latent classes: Can factor mixture models allow us to identify the latent structure of common mental disorders? Lies, damned lies and latent classes: Can factor mixture models allow us to identify the latent structure of common mental disorders? Rachel McCrea Mental Health Sciences Unit, University College London

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Cross-Validation of Item Selection and Scoring for the SF-12 Health Survey in Nine Countries: Results from the IQOLA Project

Cross-Validation of Item Selection and Scoring for the SF-12 Health Survey in Nine Countries: Results from the IQOLA Project J Clin Epidemiol Vol. 51, No. 11, pp. 1171 1178, 1998 Copyright 1998 Elsevier Science Inc. All rights reserved. 0895-4356/98 $ see front matter PII S0895-4356(98)00109-7 Cross-Validation of Item Selection

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Likert Scales. are the meaning of life: Dane Bertram

Likert Scales. are the meaning of life: Dane Bertram are the meaning of life: Note: A glossary is included near the end of this handout defining many of the terms used throughout this report. Likert Scale \lick urt\, n. Definition: Variations: A psychometric

More information

Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

Measures of diagnostic accuracy: basic definitions

Measures of diagnostic accuracy: basic definitions Measures of diagnostic accuracy: basic definitions Ana-Maria Šimundić Department of Molecular Diagnostics University Department of Chemistry, Sestre milosrdnice University Hospital, Zagreb, Croatia E-mail

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Computerized Adaptive Testing in Industrial and Organizational Psychology. Guido Makransky

Computerized Adaptive Testing in Industrial and Organizational Psychology. Guido Makransky Computerized Adaptive Testing in Industrial and Organizational Psychology Guido Makransky Promotiecommissie Promotor Prof. dr. Cees A. W. Glas Assistent Promotor Prof. dr. Svend Kreiner Overige leden Prof.

More information

Discovering process models from empirical data

Discovering process models from empirical data Discovering process models from empirical data Laura Măruşter (l.maruster@tm.tue.nl), Ton Weijters (a.j.m.m.weijters@tm.tue.nl) and Wil van der Aalst (w.m.p.aalst@tm.tue.nl) Eindhoven University of Technology,

More information

ACTIVITY LEFT ARM RIGHT ARM

ACTIVITY LEFT ARM RIGHT ARM SHOULDER ASSESSMENT FORM AMERICAN SHOULDER AND ELBOW SURGEONS Side: R L Device: RSP TSA Hemi DOS: Side: R L Device: RSP TSA Hemi DOS: Circle the number in the box that indicates your ability to do the

More information

ESA/STAT/AC.81/3-2 24 May 2001. J.L.A. van Rijckevorsel: Cross population comparison of surveys: A review of new technologies.

ESA/STAT/AC.81/3-2 24 May 2001. J.L.A. van Rijckevorsel: Cross population comparison of surveys: A review of new technologies. United Nations Statistics Division United Nations Children s Fund Statistical Office of the European Communities Centres for Disease Control and Prevention of the United States of America ESA/STAT/AC.81/3-2

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

PSYCHOLOGY Vol. II - The Construction and Use of Psychological Tests and Measures - Bruno D. Zumbo, Michaela N. Gelin, Anita M.

PSYCHOLOGY Vol. II - The Construction and Use of Psychological Tests and Measures - Bruno D. Zumbo, Michaela N. Gelin, Anita M. THE CONSTRUCTION AND USE OF PSYCHOLOGICAL TESTS AND MEASURES Bruno D. Zumbo, Michaela N. Gelin and Measurement, Evaluation, and Research Methodology Program, Department of Educational and Counselling Psychology,

More information

PRE-SERVICE SCIENCE AND PRIMARY SCHOOL TEACHERS PERCEPTIONS OF SCIENCE LABORATORY ENVIRONMENT

PRE-SERVICE SCIENCE AND PRIMARY SCHOOL TEACHERS PERCEPTIONS OF SCIENCE LABORATORY ENVIRONMENT PRE-SERVICE SCIENCE AND PRIMARY SCHOOL TEACHERS PERCEPTIONS OF SCIENCE LABORATORY ENVIRONMENT Gamze Çetinkaya 1 and Jale Çakıroğlu 1 1 Middle East Technical University Abstract: The associations between

More information