Categorization and measurement quality. The choice between Pearson and Polychoric correlations

Size: px
Start display at page:

Download "Categorization and measurement quality. The choice between Pearson and Polychoric correlations"


1 7. Chapter Categorization and measurement quality. The choice between Pearson and Polychoric correlations Germà Coenders and Willem E. Saris * ABSTRACT This chapter first reviews the major consequences of ordinal measurement in Confirmatory Factor Analysis (CFA) models and introduces the reader to the Polychoric correlation coefficient, a measure of association designed to avoid these consequences. Next, it presents the results of a Monte Carlo experiment in which the performance of the estimates of a CFA model based on Pearson and Polychoric correlations is compared under different distributional settings. The chapter concludes that, in general, it does not matter too much which measure of association one uses, as long as one is aware that factor loadings should be interpreted differently, depending on whether Pearson or Polychoric correlations are analysed. Introduction In Chapter 6 it was pointed out that categorization errors deriving from crude ordinal measurement can distort the estimates of Multitrait-Multimethod (MTMM) models, which constitute particular cases of Confirmatory Factor Analysis (CFA) models. These categorization errors are likely to be of some magnitude if the data are collected using methods offering only a few response options, like the 5-point response scales used in some studies reported in this book. The population study in Chapter 6 suggests that Polychoric correlations are not affected by categorization errors and may be preferred to Pearson correlations. That simulation, though, only considered normally distributed latent factors, whereas it is known that Polychoric correlations are biased under non-normality (O'Brien & Homer, 1987; Quiroga, 1992). This fact has not prevented other studies carried out in the literature, also using normally distributed artificial data, from recommending * The work of the first author was partly supported by the grant CIRIT BE94/I-151 from the Catalan Autonomous Government.

2 126 Coenders and Saris the use of Polychoric correlations whenever the data are categorized, which has led to a widespread use of this measure of association. This chapter reports a Monte Carlo experiment in which both normal and nonnormal underlying factors are considered. In order to interpret the outcome of the experiment, the different types of categorization errors and their consequences have first to be reviewed, as well as the assumptions and performance of the Polychoric correlation coefficient. The categorization problem Linear statistical models constitute a convenient and parsimonious tool to analyse relationships among variables. In particular, Structural Equation Models (SEM) with Latent Variables (see Jöreskog & Sörbom, 1989; or Bollen, 1989) allow the researcher to fit simultaneous regression equations taking measurement errors into account and are becoming increasingly popular. They have been widely applied to analyse survey and other types of data in many different research fields. The True Score MTMM model used in this book belongs to the family of CFA models, which constitute particular cases of SEM with latent variables. If SEM are estimated using covariances or Pearson correlations, which is usual, they assume that the data are continuous and have an interval level of measurement. While the social sciences are often interested in measuring and relating variables which are conceptually continuous, the measurement instruments used in these disciplines often fail to yield continuous and interval-level measures. Many categorical response scales which are frequently used in questionnaires can at most be assumed to be ordinal. This leads to many problems, which arise from different sorts of categorization errors. When modeling ordinally measured continuous variables, it is usually assumed that the range of the continuous variables is divided into as many regions as scale points the ordinal measurements have, and that the ordinal measurements y are related to the continuous underlying variables y * through the non-linear step function: y=k iff τ k-1 <y * τ k, for k=1,2,...,m (1) where τ k are the thresholds or cutting points of the above-mentioned regions, such that τ 0 =-, τ k-1 <τ k, τ m =. The assignment of the consecutive integer values 1,2,...,m to the categories of y is arbitrary but of common practice and will be

3 The choice between Pearson and Polychoric correlations 127 assumed throughout this chapter. An example of such step function can be seen in Figure 1a. Figure 1b shows a sample histogram of a continuous y * variable and Figure 1c the resulting sample bar chart of the 5-category y variable resulting from the step function in Figure 1a. Figure 1a also shows the linear regression line of y on y * which is implied in linear statistical models. The vertical distance between the step and linear functions can be interpreted as categorization error. This discrepancy between the linear and step functions involves two types of categorization errors, which are discussed in Johnson and Creech (1983). Grouping errors derive from discrete measurement, i.e. from collapsing several values of y * into the same y value. Transformation errors arise from non interval measurement. The arbitrary values 1,...,m may not be linearly related to the within category expectation of y *, specially if the thresholds τ i are not equally spaced. In this case, there are transformation errors (O'Brien, 1985) and the shapes of the distributions of y and y * differ strongly. In Figure 2a a step function leading to large transformation errors is shown. Note that the thresholds are far from being equally spaced. Figure 2c shows a sample bar chart of y after applying the step function in figure 2a to the y * variable in Figure 2b. Note the differences in the distributional shapes of y * (symmetric) and y (skewed). Transformation errors can also be high in case of a very high or very low mean of y *, which can lead to a high frequency of an extreme category even if the thresholds are equally spaced. A skewed y distribution does not by itself imply the existence of transformation errors. Note that the same skewed observed y distribution in Figure 2c could have been obtained from a skewed underlying distribution through a low-transformation categorization. An example of such situation can be seen in Figures 3a to 3c. The literature refers to three major consequences of grouping and transformation errors. The first one refers to bias in the covariances and correlations themselves. In absolute value, the Pearson correlation between two observed y variables is usually lower than the correlation between the continuous underlying y * variables (Bollen & Barb, 1981; Neumann, 1982; O'Brien & Homer, 1987; Quiroga, 1992). The difference can be large and easily exceed 0.1 if the number of categories is small. The remaining two consequences are better understood in the context of CFA

4 128 Coenders and Saris models and SEM with latent variables. Equation (1) is also applicable in this framework, but the underlying y * i variable is assumed to be an indicator of a latent factor η k (The notation in Jöreskog & Sörbom, 1989; and Bollen, 1989 will be used throughout): y * i=λ ik η k +ε i (2) where λ ik is the factor loading of y * i on η k and ε i is the random error term associated to y * i. When squared and standardised, λ ik can be interpreted as the measurement quality of y * i. In this framework, the two additional consequences of categorization are the reduction of measurement quality and the emergence of correlated measurement errors. The reduction of measurement quality can be understood as follows: the correlation between y i and η k is usually lower in absolute value than the correlation between y * i and η k. This reduction is related to both transformation and grouping errors. Quite intuitively, the measurement quality decreases as the number of scale-points decreases (O'Brien, 1985). To some extent, then, categorization errors have the same consequences as random errors. They attenuate the covariances and correlations among variables and they reduce the measurement quality. However, these categorization errors deviate somewhat from the random behaviour. In Figure 1a it can be seen that the categorization error depends on the score on the underlying continuous variable. Given the value of y *, the categorization error (vertical distance from the step function and the regression line) is fixed. For instance, if y * =3, the categorization error is always negative, if y * =-0.25, always positive, and so on. It can also be seen that categorization errors corresponding to two y i variables can be correlated (Johnson & Creech, 1983) which constitutes the third major consequence of categorization. The two vertical broken lines in the graph in Figure 1a represent the scores of one case on two y * i variables which are closely related, so that both values are similar. Note that both categorization errors are also similar. This would not be the case if the y * i variables had been independent.

5 The choice between Pearson and Polychoric correlations 129 Figure 1a. Step and linear functions. Low transformation errors Broken lines show the scores and the categorization errors of an individual on two related variables. Figure 1b. Histogram of a symmetric continuous variable Figure 1c. Bar chart of the variable in Figure 1b categorized with the step function in Figure 1a

6 130 Coenders and Saris Figure 2a. Step and linear functions. High transformation errors Figure 2b. Histogram of a symmetric continuous variable Figure 2c. Bar chart of the variable in Figure 2b categorized with the step function in Figure 2a

7 The choice between Pearson and Polychoric correlations 131 Figure 3a. Step and linear functions. Low transformation errors Figure 3b. Histogram of a skewed continuous variable Figure 3c. Bar chart of the variable in Figure 3b categorized with the step function in Figure 3a

8 132 Coenders and Saris Reasonably reliable indicators of the same factor or of different but correlated factors are correlated. So will be their associated errors. These correlated errors constitute a misspecification in most SEM. From Chapter 6 it is clear that this misspecification can lead to bias in the estimates (for instance in the method loadings, which play precisely the role of accounting for correlated errors) and to a bad model fit. Chapter 6 also suggests that correlated errors are of considerable magnitude only if transformation errors are present. The Polychoric correlation coefficient Because of all the problems referred in the section above, Polychoric correlations have been suggested as an alternative to covariances and Pearson correlations (Olsson, 1979a; Jöreskog, 1990). Assuming that y i and y j are the ordinal measurements of the continuous underlying variables y * i and y * j according to Equation (1), the Polychoric correlation coefficient r(y * i,y * j) is the correlation coefficient between y * i and y * j estimated from the scores of y i and y j. The form of the distribution of the y i * variables has to be assumed so as to specify the likelihood function of the Polychoric correlation. The bivariate normality assumption is usually made (See Olsson, 1979a). The programs PRELIS and PRELIS2 (Jöreskog & Sörbom, 1988, 1993a) make the estimation of the Polychoric correlation coefficient by Limited Information Maximum Likelihood. The programs follow a two-step procedure. The thresholds are first estimated from the cumulative relative frequency distributions of y i and y j and the inverse of the N(0,1) distribution function. The Polychoric correlation coefficient is next estimated by Restricted Maximum Likelihood conditional on the threshold values. In recovering the relationship among the y * variables, the Polychoric correlation coefficient corrects both grouping and transformation errors. However, it constitutes no panacea because of its reliance upon the normality assumption, which may lead to bias if this assumption is not fulfilled. Previous research on the choice between Pearson and Polychoric correlations There is a consensus that covariances and Pearson Correlations are seriously biased by categorization errors (Quiroga, 1992; O'Brien & Homer, 1987;

9 The choice between Pearson and Polychoric correlations 133 and many others). Polychoric correlations, on the contrary, are unbiased if the underlying variables are normally distributed (Jöreskog & Sörbom, 1988) but biased otherwise. Skewness seems to be the most serious type of departure from normality, specially when the degree or direction of the skewness differs from variable to variable. Even if normality does not hold, the bias is usually higher for covariances and Pearson correlations than for Polychoric correlations (Quiroga, 1992). It seems, then, well established that Polychoric Correlations are always preferable if the researcher is interested in relationships between pairs of variables or when the researcher uses SEM without latent variables. The topic of the robustness to categorization of SEM on Pearson correlations has often been dealt with in the literature considering that the model estimates should account for the correlation structure among the y * i variables (Babakus et al., 1987; Olsson, 1979b). Since the correlations differ depending on whether the y * i or the y i variables are considered, studies carried out from this point of view report high biases in the parameter estimates and lead to a recommendation to use Polychoric correlations. Particularly, the factor loadings are reported to be too low with respect to the measurement quality of the y * i variables. In our opinion, when covariances or Pearson correlations are used, factor loadings should be related to the measurement quality of the observed y i variables instead, because the use of such measures of association implicitly ignores the intervening y * i variables but concentrates on the observed variables and the latent factors alone. Considered from this point of view, the biases due to categorization are much lower. Seemingly, SEM with latent variables can appropriately interpret the lower covariances or Pearson correlations as a lower measurement quality and can take it into account so as to yield approximately correct estimates for the factor correlations (Johnson & Creech, 1983; Homer & O'Brien, 1988). From this point of view it is precisely the lower loadings which make the models work properly. In our opinion, when it comes to SEM with latent variables, the choice of measure of association is not straightforward. Authors dealing with normally distributed underlying y * variables (Ridgon & Ferguson, 1991; Jöreskog & Sörbom, 1988; Babakus et al., 1987; Olsson, 1979b) recommend the use of Polychoric correlations. Homer and O'Brien (1988) consider both normal and non-normal underlying variables and show that in the context of latent-variable models the choice between Pearson and Polychoric correlations is not so clearly balanced in favour of the latter as when only bivariate correlations are concerned. The authors conclude that in

10 134 Coenders and Saris many cases it does not matter very much which measure of association one uses. However, they do not state under which circumstances which measure should be preferred. Johnson and Creech (1983) point at categorization error correlations as the reason why latent-variable models may fail to perform completely well when Pearson correlations are used, thus implying that transformation errors should constitute the main source of distortion for these measures of association. In the following sections we carry out a Monte Carlo experiment to shed further light on this point. Monte Carlo comparison of covariances and Polychoric correlations in CFA models The previous section has pointed at transformation errors and non-normality as the key issues regarding the performance of covariances and Pearson correlations on the one side and Polychoric correlations on the other side. In this section, a Monte Carlo experiment is designed to separately explore the effects of these two main sources of distortion on covariances and Polychoric correlations. A Confirmatory Factor Analysis (CFA) model, a particular case of SEM with latent variables which is very relevant to the studies carried out in this book, is considered. Emphasis will be made on the point estimates of the type of parameters the book is concerned with, namely factor correlations and measurement quality. The experiment is then complementary to that carried out in Chapter 6 which explores only several patterns of transformation errors, but not non-normality; and which concentrates on measurement quality alone. The experiment consists in the simulation of 4 continuous variables y 1 * to y 4 *, which follow a standardised CFA model with two factors (see path-diagram in Figure 4); in their categorization into 5 categories (a number of scale points which is frequently used in this book); and in the analysis of the categorized data with the same CFA model which has been used in the simulation of the continuous data. The simulated data are generated through the following equations: y * 1=λ 11 η 1 +ε 1 y * 2=λ 21 η 1 +ε 2 (3) y * 3=λ 32 η 2 +ε 3 y * 4=λ 42 η 2 +ε 4

11 The choice between Pearson and Polychoric correlations 135 y i =k iff τ ik-1 <y i * τ ik, for k=1,2,...,5 (4) τ i0 =-,τ i1 <τ i2 <τ i3 <τ i4,τ i5 = with the following population parameters: factor correlation ρ(η 1,η 2 )=0.83; var(ε i )=0.25, var(η k )=1, λ ik =0.87, i,k; and with cov(ε i,η k )=0 i,k and cov(ε i,ε j )=0 i j. The implied correlation matrix among the y * i variables is in Table 1. 5 Monte Carlo experiments are made following the model above under 5 different distributional conditions (DC's) which are summarised in Table 2. The DC's are determined by the shape of the distribution of η 1 and η 2, and by the thresholds τ ik leading to the distributions of the y variables. The variables ε 1 to ε 4 are simulated to be normally distributed. We think that normality is a reasonable assumption for measurement errors. Figure 4. Path diagram of the simulated CFA model Table 1. Implied covariance matrix for the y * i variables y * 1 y * 2 y * 3 y * 4 y * y * y * y * The design considers three alternative distributions for the η k factors; a bivariate normal distribution, a distribution where both η k factors are skewed in the same direction and a distribution in which the η k factors are skewed in opposite directions.

12 136 Coenders and Saris The two latter distributions are obtained from weighted sums of three independent χ 2 variables 1. The design also considers three alternative marginal frequency distributions for the y i variables. A symmetric distribution, a distribution where all y i variables are skewed in the same direction and a distribution in which the indicators of different η k factors are skewed in opposite directions. These 3 y i distributions can be obtained from the three possible distributions for the η k factors with equally spaced thresholds (obtained by dividing the range containing the central 98% of probability of y i * into 5 equal-length intervals). In this case, transformation errors are low (DC's 1 to 3). Unlike the case was in Chapter 6 the design allows for skewed observed data and low transformation errors simultaneously. The two types of skewed y i distributions can also be obtained from normal η k factors if unequally spaced thresholds are used. In this case, transformation errors are high (DC's 4 and 5). Table 2 also shows the population frequency distribution of the y variables and their measurement quality (percentage of their variance explained by η). The number of replications was 200 for all DC's. The sample size was chosen to be high (n=1000) so as to avoid mixing the problems of interest with the ones derived from small sample sizes. The simulation was made with the program MINITAB8 (Minitab Inc., 1991). For each DC the program PRELIS 2.02 (Jöreskog & Sörbom, 1993a) was used to compute 200 covariance matrices and the program PRELIS 1.10 (Jöreskog & Sörbom, 1988) to compute 200 Polychoric correlation matrices from the raw data for the variables y 1 to y 4. Asymptotic sampling variances and covariances of the elements of all matrices were also computed. The use of old versions of the program is due to some anomalies found in the asymptotic variances/covariances of the Polychoric correlations computed by PRELIS The weights allow for different signs and degrees of skewness and are shown below: η 1=0.58x x 2 η 2=0.87x x x 3 Where x 1 to x 3 are standardised variables distributed as a χ 2 with 1 degree of freedom. A reversal in the sign of x 2 leads to negative skewness in η 1 and positive skewness in η 2.

13 The choice between Pearson and Polychoric correlations 137 Table 2. Description of 5 distributional conditions (DC) description characteristics of the y variables DC distrib. distrib. transfor- vars. Freqs. (%) % variance ηfactors y vars mation expl. by η 1 normal symmetric low y 1, y y 3, y skewsame skewsame low y 1, y y 3, y skewdiff skewdiff low y 1, y y 3, y normal skewsame high y 1, y y 3, y normal skewdiff high y 1, y y 3, y skewsame: skewed in the same direction; skewdiff: skewed in opposite directions. The Program LISREL8.02 (Jöreskog & Sörbom, 1989, 1993b) was used to fit the model in Figure 4 by Weighted Least Squares (WLS) on all matrices. WLS is the best available large-sample estimation procedure for both measures of association. WLS is the only method claimed to be correct for Polychoric correlations (Jöreskog, 1990; Jöreskog & Sörbom, 1993a). As regards covariances, WLS is equivalent (Jöreskog & Sörbom, 1988; Jöreskog, 1990) to the Asymptotic Distribution-Free (ADF) procedure (Browne, 1984), which is asymptotically optimal even if the data are non-normal. Muthén and Kaplan (1985, 1989) report a good large-sample behaviour of ADF estimates for a variety of distributions, at least for small models. This possibility of dealing with non-normality is currently available for covariances only and has to be taken into account when deciding which of the measures of association to adopt. The specification of the model in LISREL left all loadings, factor variancescovariances and error variances in figure 4 free, except λ 11 and λ 32 which were constrained to be equal to 1 so as to fix the scale of the ηfactors % of the replications converged to an empirically identified and admissible LISREL solution for both measures of association and all DC's. Graphical analyses of the Monte Carlo results revealed no anomalous estimates.

14 138 Coenders and Saris Estimates of the correlation coefficients In this section the performance of the measures of association themselves, i.e. their ability to reflect the relationships between pairs of variables correctly, is evaluated. In order to make the results to be comparable, the covariance matrices were rescaled to correlations. The results are presented in the first columns of Table 3. The rows of the table are the elements of the correlation matrix for a given DC and the columns contain criteria for the quality of the estimates for a given measure of association: bias (Monte Carlo average of the correlation estimates minus the value in Table 1), SD (Monte Carlo standard deviation of the correlation estimates) MSE (Mean Squared Error, Monte Carlo average of the squared deviation of the correlation estimates from the values in Table 1). Bias and SD are multiplied by 100 (for instance, a bias of 2 means that the average correlation is 0.02 higher than the true value, and a SD of 2 means a standard deviation of 0.02), and to keep coherence MSE is multiplied by 10,000. Because of the symmetry of the model, (y 1 is generated in the same way as y 2, y 3 is generated in the same way as y 4 ) the results for some of the correlations are identical except for sampling fluctuations, which allows us to show only 3 of them. Bias is omitted if it is not significantly different from zero. For all DC's we find that covariances have the largest bias (around 0.10 in absolute value), which is of negative sign (Quiroga, 1992). Note that the bias is larger for DC's 4 and 5 than for DC 1 whilst the distribution of the y * variables is the same. This is due to the transformation errors. For DC's 1, 4 an 5, we find that the assumptions of Polychoric correlations are all met, and they show no significant bias. For the rest of the DC's, Polychoric correlations are biased, but less than covariances (Biases keep around 0.05 in absolute value). The results in the literature are confirmed, that Polychoric correlations reflect the relationships between pairs of variables better than covariances or Pearson correlations. Estimates of factor correlations and measurement quality The results in this section reflect the situation in which several indicators per factor are available and the researcher is mainly interested in the relationships

15 The choice between Pearson and Polychoric correlations 139 among the factors and the measurement quality. In this situation, getting wrong correlations among the observed variables is fairly irrelevant as long as the measurement quality and factor correlation estimates are unbiased. Table 3. Results of the Monte Carlo experiment bivariate correlations CFA model estimates covariances Polychoric covariances Polychoric DC vars. bias SD MSE bias SD MSE par. bias SD MSE bias SD MSE y 2 -y ρ(η 1,η 2 ) y 3 -y λ y 4 -y λ y 2 -y ρ(η 1,η 2 ) y 3 -y λ y 4 -y λ y 2 -y ρ(η 1,η 2 ) y 3 -y λ y 4 -y λ y 2 -y ρ(η 1,η 2 ) y 3 -y λ y 4 -y λ y 2 -y ρ(η 1,η 2 ) y 3 -y λ y 4 -y λ Covariances have been rescaled to correlations. bias: Monte Carlo average minus population value ( 100) SD: Monte Carlo standard deviation ( 100) MSE: Monte Carlo mean squared deviation from the population value ( 10,000) λ 2 i: Squared standardised loadings. Bias and MSE are computed with respect to a true value of 0.75 for Polychoric correlations and with respect to the true values in Table 2 for covariances. The results are presented in the last columns of Table 3 and concern the factor correlation and the squared standardised λ estimates (Because of symmetry of the model, it is enough to show the results for λ 11 and λ 32 ). First of all it has to be considered that the squared standardised λ's mean something different depending on whether one uses Polychoric correlations or one uses covariances. In both cases, the squared standardised λ's can be interpreted as

16 140 Coenders and Saris measurement quality. In particular, given the fact that the model in Figure 4 is not a True Score model, the squared standardised λ's can be interpreted as the product of reliability and validity. If Polychoric correlations are used, the underlying y i * variables are focused upon, so that the squared standardised λ ik is an estimate of the percentage of variance of y i * explained by η k., i.e. of the measurement quality of the hypothetical underlying continuous variable y i *. Therefore, it has to be close to =0.75 as it can be seen from the parameter values of the simulated model. If covariances are used, the ordinal observed y i variables are focused upon, so that the squared standardised λ ik is an estimate of the percentage of variance of y i explained by η k, i.e. of the measurement quality of the ordinal y i variable. The y i variables are contaminated by grouping and transformation errors, so that their measurement quality is reduced and the population percentage of variance of y i explained by η k is less than The true percentages are not known but they can be closely approximated as the average of the 200 squared correlations between the y i and η k simulated data. These values are shown in the last column of Table 2 and were used instead of 0.75 as a reference to compute the bias and MSE of the squared standardised λ estimates based on covariances. For DC 1 all approaches perform very well. There is no bias for the Polychoric approach and biases in the estimates based on covariances are very slight (around 0.01 in absolute value), which contrasts with the large bias in the correlations themselves. For DC's 2 and 3 (non-normality) the bias in the estimates based on covariances keeps low while it gets larger for Polychoric correlations. This is so specially for DC 3, in which skewness in opposite directions is involved and biases can exceed 0.05 in absolute value. We can now come to a conclusion which the experimental design in Chapter 6 did not allow us to reach: covariances and Pearson correlations can be applied to skewed data, as long as the latent factors are also skewed. For DC's 4 and 5 (transformation errors), the result is reversed. Polychoric correlations lead to unbiased estimates whereas covariances lead to some bias (sometimes close to 0.05 in absolute value), which confirms the results in Chapter 6. It is then confirmed that Polychoric correlations rely on the normality assumption. It is also confirmed that the use of covariances correctly results in lower measurement

17 The choice between Pearson and Polychoric correlations 141 quality estimates, which partially makes up for the biased covariance estimates, so that parameters linking factors with each other are not so biased as it could be expected from the bias in the covariances themselves. Covariances seem to be fairly robust unless large transformation errors are present. Polychoric correlations, on the contrary, are correct regardless of the transformation errors as long as the underlying variables are normal. However, the biases which can be expected are altogether of reasonable magnitude (usually below 0.05 in absolute value for standardised parameters), so that the consequences of a wrong choice of measure of association will often not be dramatic. Determinants of the choice The choice between covariances or Pearson correlations and Polychoric correlations should be based upon the interest of the researcher and upon the existence or not of large transformation errors and deviations from normality of the y i * variables. If the interest of the researcher is concentrated upon the y * variables and their measurement quality, Polychoric correlations should be used. If it is concentrated upon the y variables, covariances or Pearson correlations should be used. If the researcher is interested in measurement errors which are not related to categorization, then Polychoric correlations should be used. If the researcher is interested in measurement-quality altogether (including the effects of categorization), or in assessing the effects of categorization on measurement quality, then Pearson correlations should be used. It has been shown that the measurement quality estimates mean something different depending on whether Pearson or Polychoric correlations are used because the measurement quality of y and y * differ. These measurement quality estimates are not interchangeable. In this book it is suggested that correlation matrices be corrected for attenuation and method effects by using appropriate measurement quality estimates. Measurement quality estimates obtained from covariances or Pearson correlations can only be used to correct covariances Pearson correlation matrices for measurement error. Measurement quality estimates obtained from Polychoric correlations can only be used to correct Polychoric correlation matrices. So, the results of the meta-analyses done in Chapters 12 and 13 can only be compared to the results of other measurement quality studies in which the same type

18 142 Coenders and Saris of measure of association has been used. As regards deviations from normality and transformation errors, the researcher should not worry too much. The results in Table 3 suggest that if m=5, the distortions are usually not very large. The bias in the relevant estimates of CFA models keeps below 0.05 in most cases, even though the skewness and transformation errors present in some DC s were quite high compared to what can be expected in applied research. Coenders and Saris (1995) carried out another simulation study using m=3 and obtained similar results. Therefore, in line with Homer and O'Brien (1988), we suggest that the choice of measure of association has no critical consequences on the substantive conclusions which may be drawn from the estimates of a CFA model, apart from what has been said about the different interpretation of the factor loadings. In spite of what has been said in the previous paragraph, a researcher interested in a fine-tuned choice of measure of association can take the following into account. If normal underlying variables are categorized with about equally-spaced thresholds, then Polychoric correlations should be preferred, but covariances and Pearson correlations will also yield approximately correct results. If normal underlying variables are subject to large transformation errors, then Polychoric correlations should be preferred. If non-normal underlying variables are categorized with about equally-spaced thresholds, then covariances or Pearson correlations should be preferred. The researcher is in a more difficult position than suggested by Chapter 6, as both Pearson and Polychoric correlations can lead to wrong results on skewed y data when y * is symmetric (Figure 2), and when y * is skewed (Figure 3) respectively. The problem is that sometimes, when seeing a skewed ordinal variable, it is impossible to know whether it comes from a normal underlying variable which has been categorized with a set of unequally-spaced thresholds or from a skewed underlying variable categorized with a set of equally-spaced thresholds because the conventional normality tests lack in power. Quiroga (1992) reports powers which are mostly below 50% and often about as low as the type I risk, even for large sample sizes, and for deviations from normality capable to cause a bias as large as 0.1 in the Polychoric correlations. Given this lack of power, exploratory data analysis may not help decide which measure of association to use. Sometimes, theory or experience helps to determine the shape of y *. For instance, it is known that data regarding the frequency of behaviour are often positively skewed, whereas attitudinal variables (such as the

19 The choice between Pearson and Polychoric correlations 143 ones mostly dealt with in the book) can be closer to normality. Theory and experience can also help to determine the kind of categorization generated by a particular questionnaire response scale. Figure 5 shows two possible 5-point response scales for the measurement of satisfaction. Large transformation errors seem more likely to arise from the second scale with asymmetric category labels, than from the first one, at least if the mean of the latent factor is not extremely high. Figure 5. Two 5-point scales for the measurement of satisfaction leading to different degrees of transformation What is your degree of satisfaction concerning? Scale 1: Scale 2: 1) -Completely dissatisfied 1) -Dissatisfied 2) -Dissatisfied 2) -Neutral 3) -Neither satisfied nor dissatisfied 3) -Fairly satisfied 4) -Satisfied 4) -Very satisfied 5) -Completely satisfied 5) -Completely satisfied References Babakus, E, Ferguson, C. E., & Jöreskog, K. G. (1987). The Sensitivity of Confirmatory Maximum Likelihood Factor Analysis to Violations of Measurement Scale and Distributional Assumptions. Journal of Marketing Research, 24, Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: Wiley. Bollen, K. A., & Barb, K. H. (1981). Pearson's r and Coarsely Categorized Data. American Sociological Review, 46, Browne, M. W. (1984). Asymptotically Distribution-Free Methods for the Analysis of Covariance Structures. British Journal of Mathematical and Statistical Psychology, 37, Coenders, G., & Saris, W. E. (1995). Alternative Approaches to Structural Modeling of Ordinal Data. A Monte Carlo Study. Paper presented at the 1995 European Meeting of the Psychometric Society. Leiden. Homer, P., & O'Brien, R. M. (1988). Using LISREL Models with Crude Rank Category Measures. Quality and Quantity, 22, Johnson, D. R., & Creech, J. C. (1983). Ordinal Measures in Multiple Indicator Models: A Simulation Study of Categorization Error. American Sociological Review, 48,

20 144 Coenders and Saris Jöreskog, K. (1990). New Developments in LISREL. Analysis of Ordinal Variables using Polychoric Correlations and Weighted Least Squares. Quality and Quantity, 24, Jöreskog, K. G., & Sörbom D. (1988). PRELIS a Program for Multivariate Data Screening and Data Summarization. A Preprocessor for LISREL. Mooresville: Scientific Software, Inc. Jöreskog, K. G., & Sörbom, D. (1989). LISREL7, a Guide to the Program and Applications. SPSS Publications. Jöreskog, K. G., & Sörbom, D. (1993a). New Features in PRELIS2. Scientific Software International. Jöreskog, K. G., & Sörbom, D. (1993b). New Features in LISREL8. Scientific Software International. Minitab Inc. (1991). MINITAB Reference Manual. Release 8. PC Version. State College: Minitab Inc. Muthén, B., & Kaplan, D. (1985). A Comparison of Some Methodologies for the Factor Analysis of Non-Normal Likert Variables. British Journal of Mathematical and Statistical Psychology, 38, Muthén, B., & Kaplan, D. (1989). A Comparison of Some Methodologies for the Factor Analysis of Non-Normal Likert Variables. A Note on the Size of the Model. British Journal of Mathematical and Statistical Psychology, 45, Neumann, L. (1982). Effects of Categorization on the Correlation Coefficient. Quality and Quantity, 16, O'Brien, R. M. (1985). The Relationship Between Ordinal Measures and their Underlying Values: Why all the Disagreement? Quality and Quantity, 19, O'Brien, R. M., & Homer, P. (1987). Correction for Coarsely Categorized Measures. LISREL's Polyserial and Polychoric Correlations. Quality and Quantity, 21, Olsson, U. (1979a). Maximum Likelihood Estimation of the Polychoric Correlation Coefficient. Psychometrika, 44, Olsson, U. (1979b). On the Robustness of Factor Analysis against Crude Categorization of Observations. Multivariate Behavioral Research, 14, Quiroga, A. M. (1992). Studies of the Polychoric Correlation and Other Correlation Measures for Ordinal Variables. Doctoral dissertation. University of Uppsala. Ridgon, E., & Ferguson Jr. C. E. (1991). The Performance of the Polychoric Correlation Coefficient and Selected Fitting Functions in Confirmatory Factor Analysis with Ordinal Data. Journal of Marketing Research, 28,

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information



More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information


CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information


UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Applications of Structural Equation Modeling in Social Sciences Research

Applications of Structural Equation Modeling in Social Sciences Research American International Journal of Contemporary Research Vol. 4 No. 1; January 2014 Applications of Structural Equation Modeling in Social Sciences Research Jackson de Carvalho, PhD Assistant Professor

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information


MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information


HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information


CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information


Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Analysing Questionnaires using Minitab (for SPSS queries contact -) Analysing Questionnaires using Minitab (for SPSS queries contact -) Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information


SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Additional sources Compilation of sources:

Additional sources Compilation of sources: Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources:

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information



More information


Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information


CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information



More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng LISREL for Windows: PRELIS User s Guide Table of contents INTRODUCTION... 1 GRAPHICAL USER INTERFACE... 2 The Data menu... 2 The Define Variables

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

Do Commodity Price Spikes Cause Long-Term Inflation?

Do Commodity Price Spikes Cause Long-Term Inflation? No. 11-1 Do Commodity Price Spikes Cause Long-Term Inflation? Geoffrey M.B. Tootell Abstract: This public policy brief examines the relationship between trend inflation and commodity price increases and

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: Currie,

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information


UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information


RESEARCH METHODS IN I/O PSYCHOLOGY RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 01/17/11 [Arthur]

More information

A Brief Introduction to Factor Analysis

A Brief Introduction to Factor Analysis 1. Introduction A Brief Introduction to Factor Analysis Factor analysis attempts to represent a set of observed variables X 1, X 2. X n in terms of a number of 'common' factors plus a factor which is unique

More information


HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information


Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Standard Deviation Estimator

Standard Deviation Estimator Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information


MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014 Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide

More information

Solving Mass Balances using Matrix Algebra

Solving Mass Balances using Matrix Algebra Page: 1 Alex Doll, P.Eng, Alex G Doll Consulting Ltd. Abstract Matrix Algebra, also known as linear algebra, is well suited to solving material balance problems encountered

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information



More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information


RESEARCH METHODS IN I/O PSYCHOLOGY RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 09/01/11 [Arthur]

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information



More information