Categorization and measurement quality. The choice between Pearson and Polychoric correlations

7. Chapter Categorization and measurement quality. The choice between Pearson and Polychoric correlations Germà Coenders and Willem E. Saris * ABSTRACT This chapter first reviews the major consequences of ordinal measurement in Confirmatory Factor Analysis (CFA) models and introduces the reader to the Polychoric correlation coefficient, a measure of association designed to avoid these consequences. Next, it presents the results of a Monte Carlo experiment in which the performance of the estimates of a CFA model based on Pearson and Polychoric correlations is compared under different distributional settings. The chapter concludes that, in general, it does not matter too much which measure of association one uses, as long as one is aware that factor loadings should be interpreted differently, depending on whether Pearson or Polychoric correlations are analysed. Introduction In Chapter 6 it was pointed out that categorization errors deriving from crude ordinal measurement can distort the estimates of Multitrait-Multimethod (MTMM) models, which constitute particular cases of Confirmatory Factor Analysis (CFA) models. These categorization errors are likely to be of some magnitude if the data are collected using methods offering only a few response options, like the 5-point response scales used in some studies reported in this book. The population study in Chapter 6 suggests that Polychoric correlations are not affected by categorization errors and may be preferred to Pearson correlations. That simulation, though, only considered normally distributed latent factors, whereas it is known that Polychoric correlations are biased under non-normality (O'Brien & Homer, 1987; Quiroga, 1992). This fact has not prevented other studies carried out in the literature, also using normally distributed artificial data, from recommending * The work of the first author was partly supported by the grant CIRIT BE94/I-151 from the Catalan Autonomous Government.

126 Coenders and Saris the use of Polychoric correlations whenever the data are categorized, which has led to a widespread use of this measure of association. This chapter reports a Monte Carlo experiment in which both normal and nonnormal underlying factors are considered. In order to interpret the outcome of the experiment, the different types of categorization errors and their consequences have first to be reviewed, as well as the assumptions and performance of the Polychoric correlation coefficient. The categorization problem Linear statistical models constitute a convenient and parsimonious tool to analyse relationships among variables. In particular, Structural Equation Models (SEM) with Latent Variables (see Jöreskog & Sörbom, 1989; or Bollen, 1989) allow the researcher to fit simultaneous regression equations taking measurement errors into account and are becoming increasingly popular. They have been widely applied to analyse survey and other types of data in many different research fields. The True Score MTMM model used in this book belongs to the family of CFA models, which constitute particular cases of SEM with latent variables. If SEM are estimated using covariances or Pearson correlations, which is usual, they assume that the data are continuous and have an interval level of measurement. While the social sciences are often interested in measuring and relating variables which are conceptually continuous, the measurement instruments used in these disciplines often fail to yield continuous and interval-level measures. Many categorical response scales which are frequently used in questionnaires can at most be assumed to be ordinal. This leads to many problems, which arise from different sorts of categorization errors. When modeling ordinally measured continuous variables, it is usually assumed that the range of the continuous variables is divided into as many regions as scale points the ordinal measurements have, and that the ordinal measurements y are related to the continuous underlying variables y * through the non-linear step function: y=k iff τ k-1 <y * τ k, for k=1,2,...,m (1) where τ k are the thresholds or cutting points of the above-mentioned regions, such that τ 0 =-, τ k-1 <τ k, τ m =. The assignment of the consecutive integer values 1,2,...,m to the categories of y is arbitrary but of common practice and will be

The choice between Pearson and Polychoric correlations 127 assumed throughout this chapter. An example of such step function can be seen in Figure 1a. Figure 1b shows a sample histogram of a continuous y * variable and Figure 1c the resulting sample bar chart of the 5-category y variable resulting from the step function in Figure 1a. Figure 1a also shows the linear regression line of y on y * which is implied in linear statistical models. The vertical distance between the step and linear functions can be interpreted as categorization error. This discrepancy between the linear and step functions involves two types of categorization errors, which are discussed in Johnson and Creech (1983). Grouping errors derive from discrete measurement, i.e. from collapsing several values of y * into the same y value. Transformation errors arise from non interval measurement. The arbitrary values 1,...,m may not be linearly related to the within category expectation of y *, specially if the thresholds τ i are not equally spaced. In this case, there are transformation errors (O'Brien, 1985) and the shapes of the distributions of y and y * differ strongly. In Figure 2a a step function leading to large transformation errors is shown. Note that the thresholds are far from being equally spaced. Figure 2c shows a sample bar chart of y after applying the step function in figure 2a to the y * variable in Figure 2b. Note the differences in the distributional shapes of y * (symmetric) and y (skewed). Transformation errors can also be high in case of a very high or very low mean of y *, which can lead to a high frequency of an extreme category even if the thresholds are equally spaced. A skewed y distribution does not by itself imply the existence of transformation errors. Note that the same skewed observed y distribution in Figure 2c could have been obtained from a skewed underlying distribution through a low-transformation categorization. An example of such situation can be seen in Figures 3a to 3c. The literature refers to three major consequences of grouping and transformation errors. The first one refers to bias in the covariances and correlations themselves. In absolute value, the Pearson correlation between two observed y variables is usually lower than the correlation between the continuous underlying y * variables (Bollen & Barb, 1981; Neumann, 1982; O'Brien & Homer, 1987; Quiroga, 1992). The difference can be large and easily exceed 0.1 if the number of categories is small. The remaining two consequences are better understood in the context of CFA

128 Coenders and Saris models and SEM with latent variables. Equation (1) is also applicable in this framework, but the underlying y * i variable is assumed to be an indicator of a latent factor η k (The notation in Jöreskog & Sörbom, 1989; and Bollen, 1989 will be used throughout): y * i=λ ik η k +ε i (2) where λ ik is the factor loading of y * i on η k and ε i is the random error term associated to y * i. When squared and standardised, λ ik can be interpreted as the measurement quality of y * i. In this framework, the two additional consequences of categorization are the reduction of measurement quality and the emergence of correlated measurement errors. The reduction of measurement quality can be understood as follows: the correlation between y i and η k is usually lower in absolute value than the correlation between y * i and η k. This reduction is related to both transformation and grouping errors. Quite intuitively, the measurement quality decreases as the number of scale-points decreases (O'Brien, 1985). To some extent, then, categorization errors have the same consequences as random errors. They attenuate the covariances and correlations among variables and they reduce the measurement quality. However, these categorization errors deviate somewhat from the random behaviour. In Figure 1a it can be seen that the categorization error depends on the score on the underlying continuous variable. Given the value of y *, the categorization error (vertical distance from the step function and the regression line) is fixed. For instance, if y * =3, the categorization error is always negative, if y * =-0.25, always positive, and so on. It can also be seen that categorization errors corresponding to two y i variables can be correlated (Johnson & Creech, 1983) which constitutes the third major consequence of categorization. The two vertical broken lines in the graph in Figure 1a represent the scores of one case on two y * i variables which are closely related, so that both values are similar. Note that both categorization errors are also similar. This would not be the case if the y * i variables had been independent.

The choice between Pearson and Polychoric correlations 129 Figure 1a. Step and linear functions. Low transformation errors Broken lines show the scores and the categorization errors of an individual on two related variables. Figure 1b. Histogram of a symmetric continuous variable Figure 1c. Bar chart of the variable in Figure 1b categorized with the step function in Figure 1a

130 Coenders and Saris Figure 2a. Step and linear functions. High transformation errors Figure 2b. Histogram of a symmetric continuous variable Figure 2c. Bar chart of the variable in Figure 2b categorized with the step function in Figure 2a

The choice between Pearson and Polychoric correlations 131 Figure 3a. Step and linear functions. Low transformation errors Figure 3b. Histogram of a skewed continuous variable Figure 3c. Bar chart of the variable in Figure 3b categorized with the step function in Figure 3a

132 Coenders and Saris Reasonably reliable indicators of the same factor or of different but correlated factors are correlated. So will be their associated errors. These correlated errors constitute a misspecification in most SEM. From Chapter 6 it is clear that this misspecification can lead to bias in the estimates (for instance in the method loadings, which play precisely the role of accounting for correlated errors) and to a bad model fit. Chapter 6 also suggests that correlated errors are of considerable magnitude only if transformation errors are present. The Polychoric correlation coefficient Because of all the problems referred in the section above, Polychoric correlations have been suggested as an alternative to covariances and Pearson correlations (Olsson, 1979a; Jöreskog, 1990). Assuming that y i and y j are the ordinal measurements of the continuous underlying variables y * i and y * j according to Equation (1), the Polychoric correlation coefficient r(y * i,y * j) is the correlation coefficient between y * i and y * j estimated from the scores of y i and y j. The form of the distribution of the y i * variables has to be assumed so as to specify the likelihood function of the Polychoric correlation. The bivariate normality assumption is usually made (See Olsson, 1979a). The programs PRELIS and PRELIS2 (Jöreskog & Sörbom, 1988, 1993a) make the estimation of the Polychoric correlation coefficient by Limited Information Maximum Likelihood. The programs follow a two-step procedure. The thresholds are first estimated from the cumulative relative frequency distributions of y i and y j and the inverse of the N(0,1) distribution function. The Polychoric correlation coefficient is next estimated by Restricted Maximum Likelihood conditional on the threshold values. In recovering the relationship among the y * variables, the Polychoric correlation coefficient corrects both grouping and transformation errors. However, it constitutes no panacea because of its reliance upon the normality assumption, which may lead to bias if this assumption is not fulfilled. Previous research on the choice between Pearson and Polychoric correlations There is a consensus that covariances and Pearson Correlations are seriously biased by categorization errors (Quiroga, 1992; O'Brien & Homer, 1987;

The choice between Pearson and Polychoric correlations 133 and many others). Polychoric correlations, on the contrary, are unbiased if the underlying variables are normally distributed (Jöreskog & Sörbom, 1988) but biased otherwise. Skewness seems to be the most serious type of departure from normality, specially when the degree or direction of the skewness differs from variable to variable. Even if normality does not hold, the bias is usually higher for covariances and Pearson correlations than for Polychoric correlations (Quiroga, 1992). It seems, then, well established that Polychoric Correlations are always preferable if the researcher is interested in relationships between pairs of variables or when the researcher uses SEM without latent variables. The topic of the robustness to categorization of SEM on Pearson correlations has often been dealt with in the literature considering that the model estimates should account for the correlation structure among the y * i variables (Babakus et al., 1987; Olsson, 1979b). Since the correlations differ depending on whether the y * i or the y i variables are considered, studies carried out from this point of view report high biases in the parameter estimates and lead to a recommendation to use Polychoric correlations. Particularly, the factor loadings are reported to be too low with respect to the measurement quality of the y * i variables. In our opinion, when covariances or Pearson correlations are used, factor loadings should be related to the measurement quality of the observed y i variables instead, because the use of such measures of association implicitly ignores the intervening y * i variables but concentrates on the observed variables and the latent factors alone. Considered from this point of view, the biases due to categorization are much lower. Seemingly, SEM with latent variables can appropriately interpret the lower covariances or Pearson correlations as a lower measurement quality and can take it into account so as to yield approximately correct estimates for the factor correlations (Johnson & Creech, 1983; Homer & O'Brien, 1988). From this point of view it is precisely the lower loadings which make the models work properly. In our opinion, when it comes to SEM with latent variables, the choice of measure of association is not straightforward. Authors dealing with normally distributed underlying y * variables (Ridgon & Ferguson, 1991; Jöreskog & Sörbom, 1988; Babakus et al., 1987; Olsson, 1979b) recommend the use of Polychoric correlations. Homer and O'Brien (1988) consider both normal and non-normal underlying variables and show that in the context of latent-variable models the choice between Pearson and Polychoric correlations is not so clearly balanced in favour of the latter as when only bivariate correlations are concerned. The authors conclude that in

134 Coenders and Saris many cases it does not matter very much which measure of association one uses. However, they do not state under which circumstances which measure should be preferred. Johnson and Creech (1983) point at categorization error correlations as the reason why latent-variable models may fail to perform completely well when Pearson correlations are used, thus implying that transformation errors should constitute the main source of distortion for these measures of association. In the following sections we carry out a Monte Carlo experiment to shed further light on this point. Monte Carlo comparison of covariances and Polychoric correlations in CFA models The previous section has pointed at transformation errors and non-normality as the key issues regarding the performance of covariances and Pearson correlations on the one side and Polychoric correlations on the other side. In this section, a Monte Carlo experiment is designed to separately explore the effects of these two main sources of distortion on covariances and Polychoric correlations. A Confirmatory Factor Analysis (CFA) model, a particular case of SEM with latent variables which is very relevant to the studies carried out in this book, is considered. Emphasis will be made on the point estimates of the type of parameters the book is concerned with, namely factor correlations and measurement quality. The experiment is then complementary to that carried out in Chapter 6 which explores only several patterns of transformation errors, but not non-normality; and which concentrates on measurement quality alone. The experiment consists in the simulation of 4 continuous variables y 1 * to y 4 *, which follow a standardised CFA model with two factors (see path-diagram in Figure 4); in their categorization into 5 categories (a number of scale points which is frequently used in this book); and in the analysis of the categorized data with the same CFA model which has been used in the simulation of the continuous data. The simulated data are generated through the following equations: y * 1=λ 11 η 1 +ε 1 y * 2=λ 21 η 1 +ε 2 (3) y * 3=λ 32 η 2 +ε 3 y * 4=λ 42 η 2 +ε 4

The choice between Pearson and Polychoric correlations 135 y i =k iff τ ik-1 <y i * τ ik, for k=1,2,...,5 (4) τ i0 =-,τ i1 <τ i2 <τ i3 <τ i4,τ i5 = with the following population parameters: factor correlation ρ(η 1,η 2 )=0.83; var(ε i )=0.25, var(η k )=1, λ ik =0.87, i,k; and with cov(ε i,η k )=0 i,k and cov(ε i,ε j )=0 i j. The implied correlation matrix among the y * i variables is in Table 1. 5 Monte Carlo experiments are made following the model above under 5 different distributional conditions (DC's) which are summarised in Table 2. The DC's are determined by the shape of the distribution of η 1 and η 2, and by the thresholds τ ik leading to the distributions of the y variables. The variables ε 1 to ε 4 are simulated to be normally distributed. We think that normality is a reasonable assumption for measurement errors. Figure 4. Path diagram of the simulated CFA model Table 1. Implied covariance matrix for the y * i variables y * 1 y * 2 y * 3 y * 4 y * 1 1.000 y * 2 0.750 1.000 y * 3 0.625 0.625 1.000 y * 4 0.625 0.625 0.750 1.000 The design considers three alternative distributions for the η k factors; a bivariate normal distribution, a distribution where both η k factors are skewed in the same direction and a distribution in which the η k factors are skewed in opposite directions.

136 Coenders and Saris The two latter distributions are obtained from weighted sums of three independent χ 2 variables 1. The design also considers three alternative marginal frequency distributions for the y i variables. A symmetric distribution, a distribution where all y i variables are skewed in the same direction and a distribution in which the indicators of different η k factors are skewed in opposite directions. These 3 y i distributions can be obtained from the three possible distributions for the η k factors with equally spaced thresholds (obtained by dividing the range containing the central 98% of probability of y i * into 5 equal-length intervals). In this case, transformation errors are low (DC's 1 to 3). Unlike the case was in Chapter 6 the design allows for skewed observed data and low transformation errors simultaneously. The two types of skewed y i distributions can also be obtained from normal η k factors if unequally spaced thresholds are used. In this case, transformation errors are high (DC's 4 and 5). Table 2 also shows the population frequency distribution of the y variables and their measurement quality (percentage of their variance explained by η). The number of replications was 200 for all DC's. The sample size was chosen to be high (n=1000) so as to avoid mixing the problems of interest with the ones derived from small sample sizes. The simulation was made with the program MINITAB8 (Minitab Inc., 1991). For each DC the program PRELIS 2.02 (Jöreskog & Sörbom, 1993a) was used to compute 200 covariance matrices and the program PRELIS 1.10 (Jöreskog & Sörbom, 1988) to compute 200 Polychoric correlation matrices from the raw data for the variables y 1 to y 4. Asymptotic sampling variances and covariances of the elements of all matrices were also computed. The use of old versions of the program is due to some anomalies found in the asymptotic variances/covariances of the Polychoric correlations computed by PRELIS2.02. 1 The weights allow for different signs and degrees of skewness and are shown below: η 1=0.58x 1+0.81x 2 η 2=0.87x 1+0.40x 2+0.29x 3 Where x 1 to x 3 are standardised variables distributed as a χ 2 with 1 degree of freedom. A reversal in the sign of x 2 leads to negative skewness in η 1 and positive skewness in η 2.

The choice between Pearson and Polychoric correlations 137 Table 2. Description of 5 distributional conditions (DC) description characteristics of the y variables DC distrib. distrib. transfor- vars. Freqs. (%) % variance ηfactors y vars mation 1 2 3 4 5 expl. by η 1 normal symmetric low y 1, y 2 8 24 36 24 8 0.69 y 3, y 4 8 24 36 24 8 0.69 2 skewsame skewsame low y 1, y 2 26 46 19 6 3 0.66 y 3, y 4 23 46 21 7 3 0.65 3 skewdiff skewdiff low y 1, y 2 4 11 44 35 6 0.61 y 3, y 4 12 47 30 8 3 0.62 4 normal skewsame high y 1, y 2 26 46 19 6 3 0.64 y 3, y 4 23 46 21 7 3 0.64 5 normal skewdiff high y 1, y 2 4 11 44 35 6 0.66 y 3, y 4 12 47 30 8 3 0.65 skewsame: skewed in the same direction; skewdiff: skewed in opposite directions. The Program LISREL8.02 (Jöreskog & Sörbom, 1989, 1993b) was used to fit the model in Figure 4 by Weighted Least Squares (WLS) on all matrices. WLS is the best available large-sample estimation procedure for both measures of association. WLS is the only method claimed to be correct for Polychoric correlations (Jöreskog, 1990; Jöreskog & Sörbom, 1993a). As regards covariances, WLS is equivalent (Jöreskog & Sörbom, 1988; Jöreskog, 1990) to the Asymptotic Distribution-Free (ADF) procedure (Browne, 1984), which is asymptotically optimal even if the data are non-normal. Muthén and Kaplan (1985, 1989) report a good large-sample behaviour of ADF estimates for a variety of distributions, at least for small models. This possibility of dealing with non-normality is currently available for covariances only and has to be taken into account when deciding which of the measures of association to adopt. The specification of the model in LISREL left all loadings, factor variancescovariances and error variances in figure 4 free, except λ 11 and λ 32 which were constrained to be equal to 1 so as to fix the scale of the ηfactors. 100.0% of the replications converged to an empirically identified and admissible LISREL solution for both measures of association and all DC's. Graphical analyses of the Monte Carlo results revealed no anomalous estimates.

138 Coenders and Saris Estimates of the correlation coefficients In this section the performance of the measures of association themselves, i.e. their ability to reflect the relationships between pairs of variables correctly, is evaluated. In order to make the results to be comparable, the covariance matrices were rescaled to correlations. The results are presented in the first columns of Table 3. The rows of the table are the elements of the correlation matrix for a given DC and the columns contain criteria for the quality of the estimates for a given measure of association: bias (Monte Carlo average of the correlation estimates minus the value in Table 1), SD (Monte Carlo standard deviation of the correlation estimates) MSE (Mean Squared Error, Monte Carlo average of the squared deviation of the correlation estimates from the values in Table 1). Bias and SD are multiplied by 100 (for instance, a bias of 2 means that the average correlation is 0.02 higher than the true value, and a SD of 2 means a standard deviation of 0.02), and to keep coherence MSE is multiplied by 10,000. Because of the symmetry of the model, (y 1 is generated in the same way as y 2, y 3 is generated in the same way as y 4 ) the results for some of the correlations are identical except for sampling fluctuations, which allows us to show only 3 of them. Bias is omitted if it is not significantly different from zero. For all DC's we find that covariances have the largest bias (around 0.10 in absolute value), which is of negative sign (Quiroga, 1992). Note that the bias is larger for DC's 4 and 5 than for DC 1 whilst the distribution of the y * variables is the same. This is due to the transformation errors. For DC's 1, 4 an 5, we find that the assumptions of Polychoric correlations are all met, and they show no significant bias. For the rest of the DC's, Polychoric correlations are biased, but less than covariances (Biases keep around 0.05 in absolute value). The results in the literature are confirmed, that Polychoric correlations reflect the relationships between pairs of variables better than covariances or Pearson correlations. Estimates of factor correlations and measurement quality The results in this section reflect the situation in which several indicators per factor are available and the researcher is mainly interested in the relationships

The choice between Pearson and Polychoric correlations 139 among the factors and the measurement quality. In this situation, getting wrong correlations among the observed variables is fairly irrelevant as long as the measurement quality and factor correlation estimates are unbiased. Table 3. Results of the Monte Carlo experiment bivariate correlations CFA model estimates covariances Polychoric covariances Polychoric DC vars. bias SD MSE bias SD MSE par. bias SD MSE bias SD MSE y 2 -y 1-6 2 37 2 3 ρ(η 1,η 2 ) -0 2 4 2 3 1 y 3 -y 1-5 2 30 2 5 λ 2 11 1 3 9 3 9 y 4 -y 3-6 2 37 2 3 λ 2 32 1 3 7 3 7 y 2 -y 1-8 2 65-4 2 24 ρ(η 1,η 2 ) 1 2 5 1 2 6 2 y 3 -y 1-6 3 45-3 3 18 λ 2 11 2 3 11-4 3 28 y 4 -y 3-8 2 75-5 2 30 λ 2 32 2 3 13-5 4 38 y 2 -y 1-12 2 139-5 2 33 ρ(η 1,η 2 ) -2 2 7 2 2 9 3 y 3 -y 1-10 2 109-3 2 17 λ 2 11 3 3 17-5 3 39 y 4 -y 3-10 2 110-6 2 41 λ 2 32 3 3 16-6 3 45 y 2 -y 1-8 2 62 2 3 ρ(η 1,η 2 ) -1 2 6 2 4 4 y 3 -y 1-7 2 55 2 5 λ 2 11 4 3 21 3 8 y 4 -y 3-8 2 66 2 3 λ 2 32 3 3 18 3 8 y 2 -y 1-9 2 76 2 3 ρ(η 1,η 2 ) -2 2 8 2 4 5 y 3 -y 1-8 2 74 2 5 λ 2 11 1 3 8 3 8 y 4 -y 3-8 2 70 2 3 λ 2 32 2 3 11 3 9 Covariances have been rescaled to correlations. bias: Monte Carlo average minus population value ( 100) SD: Monte Carlo standard deviation ( 100) MSE: Monte Carlo mean squared deviation from the population value ( 10,000) λ 2 i: Squared standardised loadings. Bias and MSE are computed with respect to a true value of 0.75 for Polychoric correlations and with respect to the true values in Table 2 for covariances. The results are presented in the last columns of Table 3 and concern the factor correlation and the squared standardised λ estimates (Because of symmetry of the model, it is enough to show the results for λ 11 and λ 32 ). First of all it has to be considered that the squared standardised λ's mean something different depending on whether one uses Polychoric correlations or one uses covariances. In both cases, the squared standardised λ's can be interpreted as

140 Coenders and Saris measurement quality. In particular, given the fact that the model in Figure 4 is not a True Score model, the squared standardised λ's can be interpreted as the product of reliability and validity. If Polychoric correlations are used, the underlying y i * variables are focused upon, so that the squared standardised λ ik is an estimate of the percentage of variance of y i * explained by η k., i.e. of the measurement quality of the hypothetical underlying continuous variable y i *. Therefore, it has to be close to 0.87 2 =0.75 as it can be seen from the parameter values of the simulated model. If covariances are used, the ordinal observed y i variables are focused upon, so that the squared standardised λ ik is an estimate of the percentage of variance of y i explained by η k, i.e. of the measurement quality of the ordinal y i variable. The y i variables are contaminated by grouping and transformation errors, so that their measurement quality is reduced and the population percentage of variance of y i explained by η k is less than 0.75. The true percentages are not known but they can be closely approximated as the average of the 200 squared correlations between the y i and η k simulated data. These values are shown in the last column of Table 2 and were used instead of 0.75 as a reference to compute the bias and MSE of the squared standardised λ estimates based on covariances. For DC 1 all approaches perform very well. There is no bias for the Polychoric approach and biases in the estimates based on covariances are very slight (around 0.01 in absolute value), which contrasts with the large bias in the correlations themselves. For DC's 2 and 3 (non-normality) the bias in the estimates based on covariances keeps low while it gets larger for Polychoric correlations. This is so specially for DC 3, in which skewness in opposite directions is involved and biases can exceed 0.05 in absolute value. We can now come to a conclusion which the experimental design in Chapter 6 did not allow us to reach: covariances and Pearson correlations can be applied to skewed data, as long as the latent factors are also skewed. For DC's 4 and 5 (transformation errors), the result is reversed. Polychoric correlations lead to unbiased estimates whereas covariances lead to some bias (sometimes close to 0.05 in absolute value), which confirms the results in Chapter 6. It is then confirmed that Polychoric correlations rely on the normality assumption. It is also confirmed that the use of covariances correctly results in lower measurement

The choice between Pearson and Polychoric correlations 141 quality estimates, which partially makes up for the biased covariance estimates, so that parameters linking factors with each other are not so biased as it could be expected from the bias in the covariances themselves. Covariances seem to be fairly robust unless large transformation errors are present. Polychoric correlations, on the contrary, are correct regardless of the transformation errors as long as the underlying variables are normal. However, the biases which can be expected are altogether of reasonable magnitude (usually below 0.05 in absolute value for standardised parameters), so that the consequences of a wrong choice of measure of association will often not be dramatic. Determinants of the choice The choice between covariances or Pearson correlations and Polychoric correlations should be based upon the interest of the researcher and upon the existence or not of large transformation errors and deviations from normality of the y i * variables. If the interest of the researcher is concentrated upon the y * variables and their measurement quality, Polychoric correlations should be used. If it is concentrated upon the y variables, covariances or Pearson correlations should be used. If the researcher is interested in measurement errors which are not related to categorization, then Polychoric correlations should be used. If the researcher is interested in measurement-quality altogether (including the effects of categorization), or in assessing the effects of categorization on measurement quality, then Pearson correlations should be used. It has been shown that the measurement quality estimates mean something different depending on whether Pearson or Polychoric correlations are used because the measurement quality of y and y * differ. These measurement quality estimates are not interchangeable. In this book it is suggested that correlation matrices be corrected for attenuation and method effects by using appropriate measurement quality estimates. Measurement quality estimates obtained from covariances or Pearson correlations can only be used to correct covariances Pearson correlation matrices for measurement error. Measurement quality estimates obtained from Polychoric correlations can only be used to correct Polychoric correlation matrices. So, the results of the meta-analyses done in Chapters 12 and 13 can only be compared to the results of other measurement quality studies in which the same type

142 Coenders and Saris of measure of association has been used. As regards deviations from normality and transformation errors, the researcher should not worry too much. The results in Table 3 suggest that if m=5, the distortions are usually not very large. The bias in the relevant estimates of CFA models keeps below 0.05 in most cases, even though the skewness and transformation errors present in some DC s were quite high compared to what can be expected in applied research. Coenders and Saris (1995) carried out another simulation study using m=3 and obtained similar results. Therefore, in line with Homer and O'Brien (1988), we suggest that the choice of measure of association has no critical consequences on the substantive conclusions which may be drawn from the estimates of a CFA model, apart from what has been said about the different interpretation of the factor loadings. In spite of what has been said in the previous paragraph, a researcher interested in a fine-tuned choice of measure of association can take the following into account. If normal underlying variables are categorized with about equally-spaced thresholds, then Polychoric correlations should be preferred, but covariances and Pearson correlations will also yield approximately correct results. If normal underlying variables are subject to large transformation errors, then Polychoric correlations should be preferred. If non-normal underlying variables are categorized with about equally-spaced thresholds, then covariances or Pearson correlations should be preferred. The researcher is in a more difficult position than suggested by Chapter 6, as both Pearson and Polychoric correlations can lead to wrong results on skewed y data when y * is symmetric (Figure 2), and when y * is skewed (Figure 3) respectively. The problem is that sometimes, when seeing a skewed ordinal variable, it is impossible to know whether it comes from a normal underlying variable which has been categorized with a set of unequally-spaced thresholds or from a skewed underlying variable categorized with a set of equally-spaced thresholds because the conventional normality tests lack in power. Quiroga (1992) reports powers which are mostly below 50% and often about as low as the type I risk, even for large sample sizes, and for deviations from normality capable to cause a bias as large as 0.1 in the Polychoric correlations. Given this lack of power, exploratory data analysis may not help decide which measure of association to use. Sometimes, theory or experience helps to determine the shape of y *. For instance, it is known that data regarding the frequency of behaviour are often positively skewed, whereas attitudinal variables (such as the

The choice between Pearson and Polychoric correlations 143 ones mostly dealt with in the book) can be closer to normality. Theory and experience can also help to determine the kind of categorization generated by a particular questionnaire response scale. Figure 5 shows two possible 5-point response scales for the measurement of satisfaction. Large transformation errors seem more likely to arise from the second scale with asymmetric category labels, than from the first one, at least if the mean of the latent factor is not extremely high. Figure 5. Two 5-point scales for the measurement of satisfaction leading to different degrees of transformation What is your degree of satisfaction concerning? Scale 1: Scale 2: 1) -Completely dissatisfied 1) -Dissatisfied 2) -Dissatisfied 2) -Neutral 3) -Neither satisfied nor dissatisfied 3) -Fairly satisfied 4) -Satisfied 4) -Very satisfied 5) -Completely satisfied 5) -Completely satisfied References Babakus, E, Ferguson, C. E., & Jöreskog, K. G. (1987). The Sensitivity of Confirmatory Maximum Likelihood Factor Analysis to Violations of Measurement Scale and Distributional Assumptions. Journal of Marketing Research, 24, 222-229. Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: Wiley. Bollen, K. A., & Barb, K. H. (1981). Pearson's r and Coarsely Categorized Data. American Sociological Review, 46, 232-239. Browne, M. W. (1984). Asymptotically Distribution-Free Methods for the Analysis of Covariance Structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83. Coenders, G., & Saris, W. E. (1995). Alternative Approaches to Structural Modeling of Ordinal Data. A Monte Carlo Study. Paper presented at the 1995 European Meeting of the Psychometric Society. Leiden. Homer, P., & O'Brien, R. M. (1988). Using LISREL Models with Crude Rank Category Measures. Quality and Quantity, 22, 191-201. Johnson, D. R., & Creech, J. C. (1983). Ordinal Measures in Multiple Indicator Models: A Simulation Study of Categorization Error. American Sociological Review, 48, 398-407.

144 Coenders and Saris Jöreskog, K. (1990). New Developments in LISREL. Analysis of Ordinal Variables using Polychoric Correlations and Weighted Least Squares. Quality and Quantity, 24, 387-404. Jöreskog, K. G., & Sörbom D. (1988). PRELIS a Program for Multivariate Data Screening and Data Summarization. A Preprocessor for LISREL. Mooresville: Scientific Software, Inc. Jöreskog, K. G., & Sörbom, D. (1989). LISREL7, a Guide to the Program and Applications. SPSS Publications. Jöreskog, K. G., & Sörbom, D. (1993a). New Features in PRELIS2. Scientific Software International. Jöreskog, K. G., & Sörbom, D. (1993b). New Features in LISREL8. Scientific Software International. Minitab Inc. (1991). MINITAB Reference Manual. Release 8. PC Version. State College: Minitab Inc. Muthén, B., & Kaplan, D. (1985). A Comparison of Some Methodologies for the Factor Analysis of Non-Normal Likert Variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. Muthén, B., & Kaplan, D. (1989). A Comparison of Some Methodologies for the Factor Analysis of Non-Normal Likert Variables. A Note on the Size of the Model. British Journal of Mathematical and Statistical Psychology, 45, 19-30. Neumann, L. (1982). Effects of Categorization on the Correlation Coefficient. Quality and Quantity, 16, 527-538. O'Brien, R. M. (1985). The Relationship Between Ordinal Measures and their Underlying Values: Why all the Disagreement? Quality and Quantity, 19, 265-277. O'Brien, R. M., & Homer, P. (1987). Correction for Coarsely Categorized Measures. LISREL's Polyserial and Polychoric Correlations. Quality and Quantity, 21, 349-360. Olsson, U. (1979a). Maximum Likelihood Estimation of the Polychoric Correlation Coefficient. Psychometrika, 44, 443-460. Olsson, U. (1979b). On the Robustness of Factor Analysis against Crude Categorization of Observations. Multivariate Behavioral Research, 14, 485-500. Quiroga, A. M. (1992). Studies of the Polychoric Correlation and Other Correlation Measures for Ordinal Variables. Doctoral dissertation. University of Uppsala. Ridgon, E., & Ferguson Jr. C. E. (1991). The Performance of the Polychoric Correlation Coefficient and Selected Fitting Functions in Confirmatory Factor Analysis with Ordinal Data. Journal of Marketing Research, 28, 491-497.