IBM SPSS Categories 24 IBM

Transcription

1 IBM SPSS Categories 24 IBM

2 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to ersion 24, release 0, modification 0 of IBM SPSS Statistics and to all subsequent releases and modifications until otherwise indicated in new editions.

3 Contents Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data. 1 What Is Optimal Scaling? Why Use Optimal Scaling? Optimal Scaling Leel and Measurement Leel... 2 Selecting the Optimal Scaling Leel Transformation Plots Category Codes Which Procedure Is Best for Your Application?... 5 Categorical Regression Categorical Principal Components Analysis... 6 Nonlinear Canonical Correlation Analysis... 6 Correspondence Analysis Multiple Correspondence Analysis Multidimensional Scaling Multidimensional Unfolding Aspect Ratio in Optimal Scaling Charts Chapter 2. Categorical Regression (CATREG) Define Scale in Categorical Regression Categorical Regression Discretization Categorical Regression Missing Values Categorical Regression Options Categorical Regression Regularization Categorical Regression Output Categorical Regression Sae Categorical Regression Transformation Plots CATREG Command Additional Features Chapter 3. Categorical Principal Components Analysis (CATPCA) Define Scale and Weight in CATPCA Categorical Principal Components Analysis Discretization Categorical Principal Components Analysis Missing Values Categorical Principal Components Analysis Options 19 Categorical Principal Components Analysis Output 21 Categorical Principal Components Analysis Sae.. 21 Categorical Principal Components Analysis Object Plots Categorical Principal Components Analysis Category Plots Categorical Principal Components Analysis Loading Plots Categorical Principal Components Analysis Bootstrap CATPCA Command Additional Features Chapter 4. Nonlinear Canonical Correlation Analysis (OVERALS) Define Range and Scale Define Range Nonlinear Canonical Correlation Analysis Options 27 OVERALS Command Additional Features Chapter 5. Correspondence Analysis 29 Define Row Range in Correspondence Analysis.. 30 Define Column Range in Correspondence Analysis 30 Correspondence Analysis Model Correspondence Analysis Statistics Correspondence Analysis Plots CORRESPONDENCE Command Additional Features Chapter 6. Multiple Correspondence Analysis Define Variable Weight in Multiple Correspondence Analysis Multiple Correspondence Analysis Discretization.. 36 Multiple Correspondence Analysis Missing Values 36 Multiple Correspondence Analysis Options Multiple Correspondence Analysis Output Multiple Correspondence Analysis Sae Multiple Correspondence Analysis Object Plots.. 38 Multiple Correspondence Analysis Variable Plots.. 39 MULTIPLE CORRESPONDENCE Command Additional Features Chapter 7. Multidimensional Scaling (PROXSCAL) Proximities in Matrices across Columns Proximities in Columns Proximities in One Column Create Proximities from Data Create Measure from Data Define a Multidimensional Scaling Model Multidimensional Scaling Restrictions Multidimensional Scaling Options Multidimensional Scaling Plots, Version Multidimensional Scaling Plots, Version Multidimensional Scaling Output PROXSCAL Command Additional Features Chapter 8. Multidimensional Unfolding (PREFSCAL) Define a Multidimensional Unfolding Model Multidimensional Unfolding Restrictions Multidimensional Unfolding Options Multidimensional Unfolding Plots Multidimensional Unfolding Output PREFSCAL Command Additional Features Notices Trademarks Index iii

4 i IBM SPSS Categories 24

5 Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data Categories procedures use optimal scaling to analyze data that are difficult or impossible for standard statistical procedures to analyze. This chapter describes what each procedure does, the situations in which each procedure is most appropriate, the relationships between the procedures, and the relationships of these procedures to their standard statistical counterparts. Note: These procedures and their implementation in IBM SPSS Statistics were deeloped by the Data Theory Scaling System Group (DTSS), consisting of members of the departments of Education and Psychology, Faculty of Social and Behaioral Sciences, Leiden Uniersity. What Is Optimal Scaling? The idea behind optimal scaling is to assign numerical quantifications to the categories of each ariable, thus allowing standard procedures to be used to obtain a solution on the quantified ariables. The optimal scale alues are assigned to categories of each ariable based on the optimizing criterion of the procedure in use. Unlike the original labels of the nominal or ordinal ariables in the analysis, these scale alues hae metric properties. In most Categories procedures, the optimal quantification for each scaled ariable is obtained through an iteratie method called alternating least squares in which, after the current quantifications are used to find a solution, the quantifications are updated using that solution. The updated quantifications are then used to find a new solution, which is used to update the quantifications, and so on, until some criterion is reached that signals the process to stop. Why Use Optimal Scaling? Categorical data are often found in marketing research, surey research, and research in the social and behaioral sciences. In fact, many researchers deal almost exclusiely with categorical data. While adaptations of most standard models exist specifically to analyze categorical data, they often do not perform well for datasets that feature: Too few obserations Too many ariables Too many alues per ariable By quantifying categories, optimal scaling techniques aoid problems in these situations. Moreoer, they are useful een when specialized techniques are appropriate. Rather than interpreting parameter estimates, the interpretation of optimal scaling output is often based on graphical displays. Optimal scaling techniques offer excellent exploratory analyses, which complement other IBM SPSS Statistics models well. By narrowing the focus of your inestigation, isualizing your data through optimal scaling can form the basis of an analysis that centers on interpretation of model parameters. Copyright IBM Corporation 1989,

6 Optimal Scaling Leel and Measurement Leel This can be a ery confusing concept when you first use Categories procedures. When specifying the leel, you specify not the leel at which ariables are measured but the leel at which they are scaled. The idea is that the ariables to be quantified may hae nonlinear relations regardless of how they are measured. For Categories purposes, there are three basic leels of measurement: The nominal leel implies that a ariable's alues represent unordered categories. Examples of ariables that might be nominal are region, zip code area, religious affiliation, and multiple choice categories. The ordinal leel implies that a ariable's alues represent ordered categories. Examples include attitude scales representing degree of satisfaction or confidence and preference rating scores. The numerical leel implies that a ariable's alues represent ordered categories with a meaningful metric so that distance comparisons between categories are appropriate. Examples include age in years and income in thousands of dollars. For example, suppose that the ariables region, job, and age are coded as shown in the following table. Table 1. Coding scheme for region, job, and age Region code Region alue Job code Job alue Age 1 North 1 intern 20 2 South 2 sales rep 22 3 East 3 manager 25 4 West 27 The alues shown represent the categories of each ariable. Region would be a nominal ariable. There are four categories of region, with no intrinsic ordering. Values 1 through 4 simply represent the four categories; the coding scheme is completely arbitrary. Job, on the other hand, could be assumed to be an ordinal ariable. The original categories form a progression from intern to manager. Larger codes represent a job higher on the corporate ladder. Howeer, only the order information is known--nothing can be said about the distance between adjacent categories. In contrast, age could be assumed to be a numerical ariable. In the case of age, the distances between the alues are intrinsically meaningful. The distance between 20 and 22 is the same as the distance between 25 and 27, while the distance between 22 and 25 is greater than either of these. Selecting the Optimal Scaling Leel It is important to understand that there are no intrinsic properties of a ariable that automatically predefine what optimal scaling leel you should specify for it. You can explore your data in any way that makes sense and makes interpretation easier. By analyzing a numerical-leel ariable at the ordinal leel, for example, the use of a nonlinear transformation may allow a solution in fewer dimensions. The following two examples illustrate how the "obious" leel of measurement might not be the best optimal scaling leel. Suppose that a ariable sorts objects into age groups. Although age can be scaled as a numerical ariable, it may be true that for people younger than 25 safety has a positie relation with age, whereas for people older than 60 safety has a negatie relation with age. In this case, it might be better to treat age as a nominal ariable. As another example, a ariable that sorts persons by political preference appears to be essentially nominal. Howeer, if you order the parties from political left to political right, you might want the quantification of parties to respect this order by using an ordinal leel of analysis. 2 IBM SPSS Categories 24

7 Een though there are no predefined properties of a ariable that make it exclusiely one leel or another, there are some general guidelines to help the noice user. With single-nominal quantification, you don't usually know the order of the categories but you want the analysis to impose one. If the order of the categories is known, you should try ordinal quantification. If the categories are unorderable, you might try multiple-nominal quantification. Transformation Plots The different leels at which each ariable can be scaled impose different restrictions on the quantifications. Transformation plots illustrate the relationship between the quantifications and the original categories resulting from the selected optimal scaling leel. For example, a linear transformation plot results when a ariable is treated as numerical. Variables treated as ordinal result in a nondecreasing transformation plot. Transformation plots for ariables treated nominally that are U-shaped (or the inerse) display a quadratic relationship. Nominal ariables could also yield transformation plots without apparent trends by changing the order of the categories completely. The following figure displays a sample transformation plot. Transformation plots are particularly suited to determining how well the selected optimal scaling leel performs. If seeral categories receie similar quantifications, collapsing these categories into one category may be warranted. Alternatiely, if a ariable treated as nominal receies quantifications that display an increasing trend, an ordinal transformation may result in a similar fit. If that trend is linear, numerical treatment may be appropriate. Howeer, if collapsing categories or changing scaling leels is warranted, the analysis will not change significantly. Category Codes Some care should be taken when coding categorical ariables because some coding schemes may yield unwanted output or incomplete analyses. Possible coding schemes for job are displayed in the following table. Table 2. Alternatie coding schemes for job Category A B C D intern sales rep manager Some Categories procedures require that the range of eery ariable used be defined. Any alue outside this range is treated as a missing alue. The minimum category alue is always 1. The maximum category alue is supplied by the user. This alue is not the number of categories for a ariable it is the largest category alue. For example, in the table, scheme A has a maximum category alue of 3 and scheme B has a maximum category alue of 7, yet both schemes code the same three categories. The ariable range determines which categories will be omitted from the analysis. Any categories with codes outside the defined range are omitted from the analysis. This is a simple method for omitting categories but can result in unwanted analyses. An incorrectly defined maximum category can omit alid categories from the analysis. For example, for scheme B, defining the maximum category alue to be 3 indicates that job has categories coded from 1 to 3; the manager category is treated as missing. Because no category has actually been coded 3, the third category in the analysis contains no cases. If you wanted to omit all manager categories, this analysis would be appropriate. Howeer, if managers are to be included, the maximum category must be defined as 7, and missing alues must be coded with alues aboe 7 or below 1. For ariables treated as nominal or ordinal, the range of the categories does not affect the results. For nominal ariables, only the label and not the alue associated with that label is important. For ordinal ariables, the order of the categories is presered in the quantifications; the category alues themseles Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 3

8 are not important. All coding schemes resulting in the same category ordering will hae identical results. For example, the first three schemes in the table are functionally equialent if job is analyzed at an ordinal leel. The order of the categories is identical in these schemes. Scheme D, on the other hand, inerts the second and third categories and will yield different results than the other schemes. Although many coding schemes for a ariable are functionally equialent, schemes with small differences between codes are preferred because the codes hae an impact on the amount of output produced by a procedure. All categories coded with alues between 1 and the user-defined maximum are alid. If any of these categories are empty, the corresponding quantifications will be either system-missing or 0, depending on the procedure. Although neither of these assignments affect the analyses, output is produced for these categories. Thus, for scheme B, job has four categories that receie system-missing alues. For scheme C, there are also four categories receiing system-missing indicators. In contrast, for scheme A there are no system-missing quantifications. Using consecutie integers as codes for ariables treated as nominal or ordinal results in much less output without affecting the results. Coding schemes for ariables treated as numerical are more restricted than the ordinal case. For these ariables, the differences between consecutie categories are important. The following table displays three coding schemes for age. Table 3. Alternatie coding schemes for age Category A B C Any recoding of numerical ariables must presere the differences between the categories. Using the original alues is one method for ensuring preseration of differences. Howeer, this can result in many categories haing system-missing indicators. For example, scheme A employs the original obsered alues. For all Categories procedures except for Correspondence Analysis, the maximum category alue is 27 and the minimum category alue is set to 1. The first 19 categories are empty and receie system-missing indicators. The output can quickly become rather cumbersome if the maximum category is much greater than 1 and there are many empty categories between 1 and the maximum. To reduce the amount of output, recoding can be done. Howeer, in the numerical case, the Automatic Recode facility should not be used. Coding to consecutie integers results in differences of 1 between all consecutie categories, and, as a result, all quantifications will be equally spaced. The metric characteristics deemed important when treating a ariable as numerical are destroyed by recoding to consecutie integers. For example, scheme C in the table corresponds to automatically recoding age. The difference between categories 22 and 25 has changed from three to one, and the quantifications will reflect the latter difference. An alternatie recoding scheme that preseres the differences between categories is to subtract the smallest category alue from eery category and add 1 to each difference. Scheme B results from this transformation. The smallest category alue, 20, has been subtracted from each category, and 1 was added to each result. The transformed codes hae a minimum of 1, and all differences are identical to the original data. The maximum category alue is now 8, and the zero quantifications before the first nonzero quantification are all eliminated. Yet, the nonzero quantifications corresponding to each category resulting from scheme B are identical to the quantifications from scheme A. 4 IBM SPSS Categories 24

9 Which Procedure Is Best for Your Application? The techniques embodied in four of these procedures (Correspondence Analysis, Multiple Correspondence Analysis, Categorical Principal Components Analysis, and Nonlinear Canonical Correlation Analysis) fall into the general area of multiariate data analysis known as dimension reduction. That is, relationships between ariables are represented in a few dimensions say two or three as often as possible. This enables you to describe structures or patterns in the relationships that would be too difficult to fathom in their original richness and complexity. In market research applications, these techniques can be a form of perceptual mapping. A major adantage of these procedures is that they accommodate data with different leels of optimal scaling. Categorical Regression describes the relationship between a categorical response ariable and a combination of categorical predictor ariables. The influence of each predictor ariable on the response ariable is described by the corresponding regression weight. As in the other procedures, data can be analyzed with different leels of optimal scaling. Multidimensional Scaling and Multidimensional Unfolding describe relationships between objects in a low-dimensional space, using the proximities between the objects. Following are brief guidelines for each of the procedures: Use Categorical Regression to predict the alues of a categorical dependent ariable from a combination of categorical independent ariables. Use Categorical Principal Components Analysis to account for patterns of ariation in a single set of ariables of mixed optimal scaling leels. Use Nonlinear Canonical Correlation Analysis to assess the extent to which two or more sets of ariables of mixed optimal scaling leels are correlated. Use Correspondence Analysis to analyze two-way contingency tables or data that can be expressed as a two-way table, such as brand preference or sociometric choice data. Use Multiple Correspondence Analysis to analyze a categorical multiariate data matrix when you are willing to make no stronger assumption that all ariables are analyzed at the nominal leel. Use Multidimensional Scaling to analyze proximity data to find a least-squares representation of a single set of objects in a low-dimensional space. Use Multidimensional Unfolding to analyze proximity data to find a least-squares representation of two sets of objects in a low-dimensional space. Categorical Regression The use of Categorical Regression is most appropriate when the goal of your analysis is to predict a dependent (response) ariable from a set of independent (predictor) ariables. As with all optimal scaling procedures, scale alues are assigned to each category of eery ariable such that these alues are optimal with respect to the regression. The solution of a categorical regression maximizes the squared correlation between the transformed response and the weighted combination of transformed predictors. Relation to other Categories procedures. Categorical regression with optimal scaling is comparable to optimal scaling canonical correlation analysis with two sets, one of which contains only the dependent ariable. In the latter technique, similarity of sets is deried by comparing each set to an unknown ariable that lies somewhere between all of the sets. In categorical regression, similarity of the transformed response and the linear combination of transformed predictors is assessed directly. Relation to standard techniques. In standard linear regression, categorical ariables can either be recoded as indicator ariables or be treated in the same fashion as interal leel ariables. In the first approach, the model contains a separate intercept and slope for each combination of the leels of the categorical ariables. This results in a large number of parameters to interpret. In the second approach, only one parameter is estimated for each ariable. Howeer, the arbitrary nature of the category codings makes generalizations impossible. Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 5

10 If some of the ariables are not continuous, alternatie analyses are aailable. If the response is continuous and the predictors are categorical, analysis of ariance is often employed. If the response is categorical and the predictors are continuous, logistic regression or discriminant analysis may be appropriate. If the response and the predictors are both categorical, loglinear models are often used. Regression with optimal scaling offers three scaling leels for each ariable. Combinations of these leels can account for a wide range of nonlinear relationships for which any single "standard" method is ill-suited. Consequently, optimal scaling offers greater flexibility than the standard approaches with minimal added complexity. In addition, nonlinear transformations of the predictors usually reduce the dependencies among the predictors. If you compare the eigenalues of the correlation matrix for the predictors with the eigenalues of the correlation matrix for the optimally scaled predictors, the latter set will usually be less ariable than the former. In other words, in categorical regression, optimal scaling makes the larger eigenalues of the predictor correlation matrix smaller and the smaller eigenalues larger. Categorical Principal Components Analysis The use of Categorical Principal Components Analysis is most appropriate when you want to account for patterns of ariation in a single set of ariables of mixed optimal scaling leels. This technique attempts to reduce the dimensionality of a set of ariables while accounting for as much of the ariation as possible. Scale alues are assigned to each category of eery ariable so that these alues are optimal with respect to the principal components solution. Objects in the analysis receie component scores based on the quantified data. Plots of the component scores reeal patterns among the objects in the analysis and can reeal unusual objects in the data. The solution of a categorical principal components analysis maximizes the correlations of the object scores with each of the quantified ariables for the number of components (dimensions) specified. An important application of categorical principal components is to examine preference data, in which respondents rank or rate a number of items with respect to preference. In the usual IBM SPSS Statistics data configuration, rows are indiiduals, columns are measurements for the items, and the scores across rows are preference scores (on a 0 to 10 scale, for example), making the data row-conditional. For preference data, you may want to treat the indiiduals as ariables. Using the Transpose procedure, you can transpose the data. The raters become the ariables, and all ariables are declared ordinal. There is no objection to using more ariables than objects in CATPCA. Relation to other Categories procedures. If all ariables are declared multiple nominal, categorical principal components analysis produces an analysis equialent to a multiple correspondence analysis run on the same ariables. Thus, categorical principal components analysis can be seen as a type of multiple correspondence analysis in which some of the ariables are declared ordinal or numerical. Relation to standard techniques. If all ariables are scaled on the numerical leel, categorical principal components analysis is equialent to standard principal components analysis. More generally, categorical principal components analysis is an alternatie to computing the correlations between non-numerical scales and analyzing them using a standard principal components or factor-analysis approach. Naie use of the usual Pearson correlation coefficient as a measure of association for ordinal data can lead to nontriial bias in estimation of the correlations. Nonlinear Canonical Correlation Analysis Nonlinear Canonical Correlation Analysis is a ery general procedure with many different applications. The goal of nonlinear canonical correlation analysis is to analyze the relationships between two or more sets of ariables instead of between the ariables themseles, as in principal components analysis. For example, you may hae two sets of ariables, where one set of ariables might be demographic background items on a set of respondents and a second set might be responses to a set of attitude items. The scaling leels in the analysis can be any mix of nominal, ordinal, and numerical. Optimal scaling 6 IBM SPSS Categories 24

11 canonical correlation analysis determines the similarity among the sets by simultaneously comparing the canonical ariables from each set to a compromise set of scores assigned to the objects. Relation to other Categories procedures. If there are two or more sets of ariables with only one ariable per set, optimal scaling canonical correlation analysis is equialent to optimal scaling principal components analysis. If all ariables in a one-ariable-per-set analysis are multiple nominal, optimal scaling canonical correlation analysis is equialent to multiple correspondence analysis. If there are two sets of ariables, one of which contains only one ariable, optimal scaling canonical correlation analysis is equialent to categorical regression with optimal scaling. Relation to standard techniques. Standard canonical correlation analysis is a statistical technique that finds a linear combination of one set of ariables and a linear combination of a second set of ariables that are maximally correlated. Gien this set of linear combinations, canonical correlation analysis can find subsequent independent sets of linear combinations, referred to as canonical ariables, up to a maximum number equal to the number of ariables in the smaller set. If there are two sets of ariables in the analysis and all ariables are defined to be numerical, optimal scaling canonical correlation analysis is equialent to a standard canonical correlation analysis. Although IBM SPSS Statistics does not hae a canonical correlation analysis procedure, many of the releant statistics can be obtained from multiariate analysis of ariance. Optimal scaling canonical correlation analysis has arious other applications. If you hae two sets of ariables and one of the sets contains a nominal ariable declared as single nominal, optimal scaling canonical correlation analysis results can be interpreted in a similar fashion to regression analysis. If you consider the ariable to be multiple nominal, the optimal scaling analysis is an alternatie to discriminant analysis. Grouping the ariables in more than two sets proides a ariety of ways to analyze your data. Correspondence Analysis The goal of correspondence analysis is to make biplots for correspondence tables. In a correspondence table, the row and column ariables are assumed to represent unordered categories; therefore, the nominal optimal scaling leel is always used. Both ariables are inspected for their nominal information only. That is, the only consideration is the fact that some objects are in the same category while others are not. Nothing is assumed about the distance or order between categories of the same ariable. One specific use of correspondence analysis is the analysis of two-way contingency tables. If a table has r actie rows and c actie columns, the number of dimensions in the correspondence analysis solution is the minimum of r minus 1 or c minus 1, whicheer is less. In other words, you could perfectly represent the row categories or the column categories of a contingency table in a space of dimensions. Practically speaking, howeer, you would like to represent the row and column categories of a two-way table in a low-dimensional space, say two dimensions, for the reason that two-dimensional plots are more easily comprehensible than multidimensional spatial representations. When fewer than the maximum number of possible dimensions is used, the statistics produced in the analysis describe how well the row and column categories are represented in the low-dimensional representation. Proided that the quality of representation of the two-dimensional solution is good, you can examine plots of the row points and the column points to learn which categories of the row ariable are similar, which categories of the column ariable are similar, and which row and column categories are similar to each other. Relation to other Categories procedures. Simple correspondence analysis is limited to two-way tables. If there are more than two ariables of interest, you can combine ariables to create interaction ariables. For example, for the ariables region, job, and age, you can combine region and job to create a new ariable rejob with the 12 categories shown in the following table. This new ariable forms a two-way table with age (12 rows, 4 columns), which can be analyzed in correspondence analysis. Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 7

12 Table 4. Combinations of region and job Category code Category definition Category code Category definition 1 North, intern 7 East, intern 2 North, sales rep 8 East, sales rep 3 North, manager 9 East, manager 4 South, intern 10 West, intern 5 South, sales rep 11 West, sales rep 6 South, manager 12 West, manager One shortcoming of this approach is that any pair of ariables can be combined. We can combine job and age, yielding another 12-category ariable. Or we can combine region and age, which results in a new 16-category ariable. Each of these interaction ariables forms a two-way table with the remaining ariable. Correspondence analyses of these three tables will not yield identical results, yet each is a alid approach. Furthermore, if there are four or more ariables, two-way tables comparing an interaction ariable with another interaction ariable can be constructed. The number of possible tables to analyze can get quite large, een for a few ariables. You can select one of these tables to analyze, or you can analyze all of them. Alternatiely, the Multiple Correspondence Analysis procedure can be used to examine all of the ariables simultaneously without the need to construct interaction ariables. Relation to standard techniques. The Crosstabs procedure can also be used to analyze contingency tables, with independence as a common focus in the analyses. Howeer, een in small tables, detecting the cause of departures from independence may be difficult. The utility of correspondence analysis lies in displaying such patterns for two-way tables of any size. If there is an association between the row and column ariables--that is, if the chi-square alue is significant--correspondence analysis may help reeal the nature of the relationship. Multiple Correspondence Analysis Multiple Correspondence Analysis tries to produce a solution in which objects within the same category are plotted close together and objects in different categories are plotted far apart. Each object is as close as possible to the category points of categories that apply to the object. In this way, the categories diide the objects into homogeneous subgroups. Variables are considered homogeneous when they classify objects in the same categories into the same subgroups. For a one-dimensional solution, multiple correspondence analysis assigns optimal scale alues (category quantifications) to each category of each ariable in such a way that oerall, on aerage, the categories hae maximum spread. For a two-dimensional solution, multiple correspondence analysis finds a second set of quantifications of the categories of each ariable unrelated to the first set, attempting again to maximize spread, and so on. Because categories of a ariable receie as many scorings as there are dimensions, the ariables in the analysis are assumed to be multiple nominal in optimal scaling leel. Multiple correspondence analysis also assigns scores to the objects in the analysis in such a way that the category quantifications are the aerages, or centroids, of the object scores of objects in that category. Relation to other Categories procedures. Multiple correspondence analysis is also known as homogeneity analysis or dual scaling. It gies comparable, but not identical, results to correspondence analysis when there are only two ariables. Correspondence analysis produces unique output summarizing the fit and quality of representation of the solution, including stability information. Thus, correspondence analysis is usually preferable to multiple correspondence analysis in the two-ariable case. Another difference between the two procedures is that the input to multiple correspondence analysis is a data matrix, where the rows are objects and the columns are ariables, while the input to correspondence analysis can be the same data matrix, a general proximity matrix, or a joint contingency 8 IBM SPSS Categories 24

13 table, which is an aggregated matrix in which both the rows and columns represent categories of ariables. Multiple correspondence analysis can also be thought of as principal components analysis of data scaled at the multiple nominal leel. Relation to standard techniques. Multiple correspondence analysis can be thought of as the analysis of a multiway contingency table. Multiway contingency tables can also be analyzed with the Crosstabs procedure, but Crosstabs gies separate summary statistics for each category of each control ariable. With multiple correspondence analysis, it is often possible to summarize the relationship between all of the ariables with a single two-dimensional plot. An adanced use of multiple correspondence analysis is to replace the original category alues with the optimal scale alues from the first dimension and perform a secondary multiariate analysis. Since multiple correspondence analysis replaces category labels with numerical scale alues, many different procedures that require numerical data can be applied after the multiple correspondence analysis. For example, the Factor Analysis procedure produces a first principal component that is equialent to the first dimension of multiple correspondence analysis. The component scores in the first dimension are equal to the object scores, and the squared component loadings are equal to the discrimination measures. The second multiple correspondence analysis dimension, howeer, is not equal to the second dimension of factor analysis. Multidimensional Scaling The use of Multidimensional Scaling is most appropriate when the goal of your analysis is to find the structure in a set of distance measures between a single set of objects or cases. This is accomplished by assigning obserations to specific locations in a conceptual low-dimensional space so that the distances between points in the space match the gien (dis)similarities as closely as possible. The result is a least-squares representation of the objects in that low-dimensional space, which, in many cases, will help you further understand your data. Relation to other Categories procedures. When you hae multiariate data from which you create distances and which you then analyze with multidimensional scaling, the results are similar to analyzing the data using categorical principal components analysis with object principal normalization. This kind of PCA is also known as principal coordinates analysis. Relation to standard techniques. The Categories Multidimensional Scaling procedure (PROXSCAL) offers seeral improements upon the scaling procedure aailable in the Statistics Base option (ALSCAL). PROXSCAL offers an accelerated algorithm for certain models and allows you to put restrictions on the common space. Moreoer, PROXSCAL attempts to minimize normalized raw stress rather than S-stress (also referred to as strain). The normalized raw stress is generally preferred because it is a measure based on the distances, while the S-stress is based on the squared distances. Multidimensional Unfolding The use of Multidimensional Unfolding is most appropriate when the goal of your analysis is to find the structure in a set of distance measures between two sets of objects (referred to as the row and column objects). This is accomplished by assigning obserations to specific locations in a conceptual low-dimensional space so that the distances between points in the space match the gien (dis)similarities as closely as possible. The result is a least-squares representation of the row and column objects in that low-dimensional space, which, in many cases, will help you further understand your data. Relation to other Categories procedures. If your data consist of distances between a single set of objects (a square, symmetrical matrix), use Multidimensional Scaling. Relation to standard techniques. The Categories Multidimensional Unfolding procedure (PREFSCAL) offers seeral improements upon the unfolding functionality aailable in the Statistics Base option (through ALSCAL). PREFSCAL allows you to put restrictions on the common space; moreoer, PREFSCAL attempts to minimize a penalized stress measure that helps it to aoid degenerate solutions (to which older algorithms are prone). Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 9

14 Aspect Ratio in Optimal Scaling Charts Aspect ratio in optimal scaling plots is isotropic. In a two-dimensional plot, the distance representing one unit in dimension 1 is equal to the distance representing one unit in dimension 2. If you change the range of a dimension in a two-dimensional plot, the system changes the size of the other dimension to keep the physical distances equal. Isotropic aspect ratio cannot be oerridden for the optimal scaling procedures. 10 IBM SPSS Categories 24

15 Chapter 2. Categorical Regression (CATREG) Categorical regression quantifies categorical data by assigning numerical alues to the categories, resulting in an optimal linear regression equation for the transformed ariables. Categorical regression is also known by the acronym CATREG, for categorical regression. Standard linear regression analysis inoles minimizing the sum of squared differences between a response (dependent) ariable and a weighted combination of predictor (independent) ariables. Variables are typically quantitatie, with (nominal) categorical data recoded to binary or contrast ariables. As a result, categorical ariables sere to separate groups of cases, and the technique estimates separate sets of parameters for each group. The estimated coefficients reflect how changes in the predictors affect the response. Prediction of the response is possible for any combination of predictor alues. An alternatie approach inoles regressing the response on the categorical predictor alues themseles. Consequently, one coefficient is estimated for each ariable. Howeer, for categorical ariables, the category alues are arbitrary. Coding the categories in different ways yield different coefficients, making comparisons across analyses of the same ariables difficult. CATREG extends the standard approach by simultaneously scaling nominal, ordinal, and numerical ariables. The procedure quantifies categorical ariables so that the quantifications reflect characteristics of the original categories. The procedure treats quantified categorical ariables in the same way as numerical ariables. Using nonlinear transformations allow ariables to be analyzed at a ariety of leels to find the best-fitting model. Example. Categorical regression could be used to describe how job satisfaction depends on job category, geographic region, and amount of trael. You might find that high leels of satisfaction correspond to managers and low trael. The resulting regression equation could be used to predict job satisfaction for any combination of the three independent ariables. Statistics and plots. Frequencies, regression coefficients, ANOVA table, iteration history, category quantifications, correlations between untransformed predictors, correlations between transformed predictors, residual plots, and transformation plots. Categorical Regression Data Considerations Data. CATREG operates on category indicator ariables. The category indicators should be positie integers. You can use the Discretization dialog box to conert fractional-alue ariables and string ariables into positie integers. Assumptions. Only one response ariable is allowed, but the maximum number of predictor ariables is 200. The data must contain at least three alid cases, and the number of alid cases must exceed the number of predictor ariables plus one. Related procedures. CATREG is equialent to categorical canonical correlation analysis with optimal scaling (OVERALS) with two sets, one of which contains only one ariable. Scaling all ariables at the numerical leel corresponds to standard multiple regression analysis. To Obtain a Categorical Regression 1. From the menus choose: Analyze > Regression > Optimal Scaling (CATREG) Select the dependent ariable and independent ariable(s). Copyright IBM Corporation 1989,

16 3. Click OK. Optionally, change the scaling leel for each ariable. Define Scale in Categorical Regression You can set the optimal scaling leel for the dependent and independent ariables. By default, they are scaled as second-degree monotonic splines (ordinal) with two interior knots. Additionally, you can set the weight for analysis ariables. Optimal Scaling Leel. You can also select the scaling leel for quantifying each ariable. Spline Ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth monotonic piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Spline Nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth, possibly nonmonotonic, piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline ordinal transformation but is less smooth. Nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline nominal transformation but is less smooth. Numeric. Categories are treated as ordered and equally spaced (interal leel). The order of the categories and the equal distances between category numbers of the obsered ariable are presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. When all ariables are at the numeric leel, the analysis is analogous to standard principal components analysis. Categorical Regression Discretization The Discretization dialog box allows you to select a method of recoding your ariables. Fractional-alue ariables are grouped into seen categories (or into the number of distinct alues of the ariable if this number is less than seen) with an approximately normal distribution unless otherwise specified. String ariables are always conerted into positie integers by assigning category indicators according to ascending alphanumeric order. Discretization for string ariables applies to these integers. Other ariables are left alone by default. The discretized ariables are then used in the analysis. Method. Choose between grouping, ranking, and multiplying. Grouping. Recode into a specified number of categories or recode by interal. Ranking. The ariable is discretized by ranking the cases. Multiplying. The current alues of the ariable are standardized, multiplied by 10, rounded, and hae a constant added so that the lowest discretized alue is 1. Grouping. The following options are aailable when discretizing ariables by grouping: Number of categories. Specify a number of categories and whether the alues of the ariable should follow an approximately normal or uniform distribution across those categories. 12 IBM SPSS Categories 24

17 Equal interals. Variables are recoded into categories defined by these equally sized interals. You must specify the length of the interals. Categorical Regression Missing Values The Missing Values dialog box allows you to choose the strategy for handling missing alues in analysis ariables and supplementary ariables. Strategy. Choose to exclude objects with missing alues (listwise deletion) or impute missing alues (actie treatment). Exclude objects with missing alues on this ariable. Objects with missing alues on the selected ariable are excluded from the analysis. This strategy is not aailable for supplementary ariables. Impute missing alues. Objects with missing alues on the selected ariable hae those alues imputed. You can choose the method of imputation. Select Mode to replace missing alues with the most frequent category. When there are multiple modes, the one with the smallest category indicator is used. Select Extra category to replace missing alues with the same quantification of an extra category. This implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Categorical Regression Options The Options dialog box allows you to select the initial configuration style, specify iteration and conergence criteria, select supplementary objects, and set the labeling of plots. Supplementary Objects. This allows you to specify the objects that you want to treat as supplementary. Simply type the number of a supplementary object (or specify a range of cases) and click Add. You cannot weight supplementary objects (specified weights are ignored). Initial Configuration. If no ariables are treated as nominal, select the Numerical configuration. If at least one ariable is treated as nominal, select the Random configuration. Alternatiely, if at least one ariable has an ordinal or spline ordinal scaling leel, the usual model-fitting algorithm can result in a suboptimal solution. Choosing Multiple systematic starts with all possible sign patterns to test will always find the optimal solution, but the necessary processing time rapidly increases as the number of ordinal and spline ordinal ariables in the dataset increase. You can reduce the number of test patterns by specifying a percentage of loss of ariance threshold, where the higher the threshold, the more sign patterns will be excluded. With this option, obtaining the optimal solution is not garantueed, but the chance of obtaining a suboptimal solution is diminished. Also, if the optimal solution is not found, the chance that the suboptimal solution is ery different from the optimal solution is diminished. When multiple systematic starts are requested, the signs of the regression coefficients for each start are written to an external IBM SPSS Statistics data file or dataset in the current session. See the topic Categorical Regression Sae on page 15 for more information. The results of a preious run with multiple systematic starts allows you to Use fixed signs for the regression coefficients. The signs (indicated by 1 and 1) need to be in a row of the specified dataset or file. The integer-alued starting number is the case number of the row in this file that contains the signs to be used. Criteria. You can specify the maximum number of iterations that the regression may go through in its computations. You can also select a conergence criterion alue. The regression stops iterating if the difference in total fit between the last two iterations is less than the conergence alue or if the maximum number of iterations is reached. Label Plots By. Allows you to specify whether ariables and alue labels or ariable names and alues will be used in the plots. You can also specify a maximum length for labels. Chapter 2. Categorical Regression (CATREG) 13

18 Categorical Regression Regularization Method. Regularization methods can improe the predictie error of the model by reducing the ariability in the estimates of regression coefficient by shrinking the estimates toward 0. The Lasso and Elastic Net will shrink some coefficient estimates to exactly 0, thus proiding a form of ariable selection. When a regularization method is requested, the regularized model and coefficients for each penalty coefficient alue are written to an external IBM SPSS Statistics data file or dataset in the current session. See the topic Categorical Regression Sae on page 15 for more information. Ridge regression. Ridge regression shrinks coefficients by introducing a penalty term equal to the sum of squared coefficients times a penalty coefficient. This coefficient can range from 0 (no penalty) to 1; the procedure will search for the "best" alue of the penalty if you specify a range and increment. Lasso. The Lasso's penalty term is based on the sum of absolute coefficients, and the specification of a penalty coefficient is similar to that of Ridge regression; howeer, the Lasso is more computationally intensie. Elastic net. The Elastic Net simply combines the Lasso and Ridge regression penalties, and will search oer the grid of alues specified to find the "best" Lasso and Ridge regression penalty coefficients. For a gien pair of Lasso and Ridge regression penalties, the Elastic Net is not much more computationally expensie than the Lasso. Display regularization plots. These are plots of the regression coefficients ersus the regularization penalty. When searching a range of alues for the "best" penalty coefficient, it proides a iew of how the regression coefficients change oer that range. Elastic Net Plots. For the Elastic Net method, separate regularization plots are produced by alues of the Ridge regression penalty. All possible plots uses eery alue in the range determined by the minimum and maximum Ridge regression penalty alues specified. For some Ridge penalties allows you to specify a subset of the alues in the range determined by the minimum and maximum. Simply type the number of a penalty alue (or specify a range of alues) and click Add. Categorical Regression Output The Output dialog box allows you to select the statistics to display in the output. Tables. Produces tables for: Multiple R. Includes R 2, adjusted R 2, and adjusted R 2 taking the optimal scaling into account. ANOVA. This option includes regression and residual sums of squares, mean squares, and F. Two ANOVA tables are displayed: one with degrees of freedom for the regression equal to the number of predictor ariables and one with degrees of freedom for the regression taking the optimal scaling into account. Coefficients. This option gies three tables: a Coefficients table that includes betas, standard error of the betas, t alues, and significance; a Coefficients-Optimal Scaling table with the standard error of the betas taking the optimal scaling degrees of freedom into account; and a table with the zero-order, part, and partial correlation, Pratt's relatie importance measure for the transformed predictors, and the tolerance before and after transformation. Iteration history. For each iteration, including the starting alues for the algorithm, the multiple R and regression error are shown. The increase in multiple R is listed starting from the first iteration. Correlations of original ariables. A matrix showing the correlations between the untransformed ariables is displayed. Correlations of transformed ariables. A matrix showing the correlations between the transformed ariables is displayed. Regularized models and coefficients. Displays penalty alues, R-square, and the regression coefficients for each regularized model. If a resampling method is specified or if supplementary objects (test cases) are specified, it also displays the prediction error or test MSE. 14 IBM SPSS Categories 24

19 Resampling. Resampling methods gie you an estimate of the prediction error of the model. Crossalidation. Crossalidation diides the sample into a number of subsamples, or folds. Categorical regression models are then generated, excluding the data from each subsample in turn. The first model is based on all of the cases except those in the first sample fold, the second model is based on all of the cases except those in the second sample fold, and so on. For each model, the prediction error is estimated by applying the model to the subsample excluded in generating it..632 Bootstrap. With the bootstrap, obserations are drawn randomly from the data with replacement, repeating this process a number of times to obtain a number bootstrap samples. A model is fit for each bootstrap sample. The prediction error for each model is estimated by applying the fitted model to the cases not in the bootstrap sample. Category Quantifications. Tables showing the transformed alues of the selected ariables are displayed. Descriptie Statistics. Tables showing the frequencies, missing alues, and modes of the selected ariables are displayed. Categorical Regression Sae The Sae dialog box allows you to sae predicted alues, residuals, and transformed alues to the actie dataset and/or sae discretized data, transformed alues, regularized models and coefficients, and signs of regression coefficients to an external IBM SPSS Statistics data file or dataset in the current session. Datasets are aailable during the current session but are not aailable in subsequent sessions unless you explicitly sae them as data files. Dataset names must adhere to ariable naming rules. Filenames or dataset names must be different for each type of data saed. Regularized models and coefficients are saed wheneer a regularization method is selected on the Regularization dialog. By default, the procedure creates a new dataset with a unique name, but you can of course specify a name of your own choosing or write to an external file. Signs of regression coefficients are saed wheneer multiple systematic starts are used as the initial configuration on the Options dialog. By default, the procedure creates a new dataset with a unique name, but you can of course specify a name of your own choosing or write to an external file. Categorical Regression Transformation Plots The Plots dialog box allows you to specify the ariables that will produce transformation and residual plots. Transformation Plots. For each of these ariables, the category quantifications are plotted against the original category alues. Empty categories appear on the horizontal axis but do not affect the computations. These categories are identified by breaks in the line connecting the quantifications. Residual Plots. For each of these ariables, residuals (computed for the dependent ariable predicted from all predictor ariables except the predictor ariable in question) are plotted against category indicators and the optimal category quantifications multiplied with beta against category indicators. CATREG Command Additional Features You can customize your categorical regression if you paste your selections into a syntax window and edit the resulting CATREG command syntax. The command syntax language also allows you to: Specify rootnames for the transformed ariables when saing them to the actie dataset (with the SAVE subcommand). See the Command Syntax Reference for complete syntax information. Chapter 2. Categorical Regression (CATREG) 15

20 16 IBM SPSS Categories 24

21 Chapter 3. Categorical Principal Components Analysis (CATPCA) This procedure simultaneously quantifies categorical ariables while reducing the dimensionality of the data. Categorical principal components analysis is also known by the acronym CATPCA, for categorical principal components analysis. The goal of principal components analysis is to reduce an original set of ariables into a smaller set of uncorrelated components that represent most of the information found in the original ariables. The technique is most useful when a large number of ariables prohibits effectie interpretation of the relationships between objects (subjects and units). By reducing the dimensionality, you interpret a few components rather than a large number of ariables. Standard principal components analysis assumes linear relationships between numeric ariables. On the other hand, the optimal-scaling approach allows ariables to be scaled at different leels. Categorical ariables are optimally quantified in the specified dimensionality. As a result, nonlinear relationships between ariables can be modeled. Example. Categorical principal components analysis could be used to graphically display the relationship between job category, job diision, region, amount of trael (high, medium, and low), and job satisfaction. You might find that two dimensions account for a large amount of ariance. The first dimension might separate job category from region, whereas the second dimension might separate job diision from amount of trael. You also might find that high job satisfaction is related to a medium amount of trael. Statistics and plots. Frequencies, missing alues, optimal scaling leel, mode, ariance accounted for by centroid coordinates, ector coordinates, total per ariable and per dimension, component loadings for ector-quantified ariables, category quantifications and coordinates, iteration history, correlations of the transformed ariables and eigenalues of the correlation matrix, correlations of the original ariables and eigenalues of the correlation matrix, object scores, category plots, joint category plots, transformation plots, residual plots, projected centroid plots, object plots, biplots, triplots, and component loadings plots. Categorical Principal Components Analysis Data Considerations Data. String ariable alues are always conerted into positie integers by ascending alphanumeric order. User-defined missing alues, system-missing alues, and alues less than 1 are considered missing; you can recode or add a constant to ariables with alues less than 1 to make them nonmissing. Assumptions. The data must contain at least three alid cases. The analysis is based on positie integer data. The discretization option will automatically categorize a fractional-alued ariable by grouping its alues into categories with a close to "normal" distribution and will automatically conert alues of string ariables into positie integers. You can specify other discretization schemes. Related procedures. Scaling all ariables at the numeric leel corresponds to standard principal components analysis. Alternate plotting features are aailable by using the transformed ariables in a standard linear principal components analysis. If all ariables hae multiple nominal scaling leels, categorical principal components analysis is identical to multiple correspondence analysis. If sets of ariables are of interest, categorical (nonlinear) canonical correlation analysis should be used. To Obtain a Categorical Principal Components Analysis 1. From the menus choose: Analyze > Dimension Reduction > Optimal Scaling Select Some ariable(s) not multiple nominal. Copyright IBM Corporation 1989,

22 3. Select One set. 4. Click Define. 5. Select at least two analysis ariables and specify the number of dimensions in the solution. 6. Click OK. You may optionally specify supplementary ariables, which are fitted into the solution found, or labeling ariables for the plots. Define Scale and Weight in CATPCA You can set the optimal scaling leel for analysis ariables and supplementary ariables. By default, they are scaled as second-degree monotonic splines (ordinal) with two interior knots. Additionally, you can set the weight for analysis ariables. Variable weight. You can choose to define a weight for each ariable. The alue specified must be a positie integer. The default alue is 1. Optimal Scaling Leel. You can also select the scaling leel to be used to quantify each ariable. Spline ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth monotonic piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Spline nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth, possibly nonmonotonic, piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Multiple nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be in the centroid of the objects in the particular categories. Multiple indicates that different sets of quantifications are obtained for each dimension. Ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline ordinal transformation but is less smooth. Nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline nominal transformation but is less smooth. Numeric. Categories are treated as ordered and equally spaced (interal leel). The order of the categories and the equal distances between category numbers of the obsered ariable are presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. When all ariables are at the numeric leel, the analysis is analogous to standard principal components analysis. Categorical Principal Components Analysis Discretization The Discretization dialog box allows you to select a method of recoding your ariables. Fractional-alued ariables are grouped into seen categories (or into the number of distinct alues of the ariable if this number is less than seen) with an approximately normal distribution, unless specified otherwise. String ariables are always conerted into positie integers by assigning category indicators according to ascending alphanumeric order. Discretization for string ariables applies to these integers. Other ariables are left alone by default. The discretized ariables are then used in the analysis. 18 IBM SPSS Categories 24

23 Method. Choose between grouping, ranking, and multiplying. Grouping. Recode into a specified number of categories or recode by interal. Ranking. The ariable is discretized by ranking the cases. Multiplying. The current alues of the ariable are standardized, multiplied by 10, rounded, and hae a constant added such that the lowest discretized alue is 1. Grouping. The following options are aailable when you are discretizing ariables by grouping: Number of categories. Specify a number of categories and whether the alues of the ariable should follow an approximately normal or uniform distribution across those categories. Equal interals. Variables are recoded into categories defined by these equally sized interals. You must specify the length of the interals. Categorical Principal Components Analysis Missing Values Use the Missing Values dialog box to choose the strategy for handling missing alues in analysis ariables and supplementary ariables. Strategy. Choose to exclude missing alues (passie treatment), impute missing alues (actie treatment), or exclude objects with missing alues (listwise deletion). Exclude missing alues; for correlations impute after quantification. Objects with missing alues on the selected ariable do not contribute to the analysis for this ariable. If all ariables are gien passie treatment, then objects with missing alues on all ariables are treated as supplementary. If correlations are specified in the Output dialog box, then (after analysis) missing alues are imputed with the most frequent category, or mode, of the ariable for the correlations of the original ariables. For the correlations of the optimally scaled ariables, you can choose the method of imputation. Mode. Replace missing alues with the mode of the optimally scaled ariable. Extra category. Replace missing alues with the quantification of an extra category. This setting implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Random category. Impute each missing alue on a ariable with the quantified alue of a different random category number based on the marginal frequencies of the categories of the ariable. Impute missing alues. Objects with missing alues on the selected ariable hae those alues imputed. You can choose the method of imputation. Mode. Replace missing alues with the most frequent category. When there are multiple modes, the one with the smallest category indicator is used. Extra category. Replace missing alues with the same quantification of an extra category. This setting implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Random category. Replace each missing alue on a ariable with a different random category number based on the marginal frequencies of the categories. Exclude objects with missing alues on this ariable. Objects with missing alues on the selected ariable are excluded from the analysis. This strategy is not aailable for supplementary ariables. Categorical Principal Components Analysis Options The Options dialog box proides controls to select the initial configuration, specify iteration and conergence criteria, select a normalization method, choose the method for labeling plots, and specify supplementary objects. Supplementary Objects. Specify the case number of the object, or the first and last case numbers of a range of objects, that you want to make supplementary and then click Add. If an object is specified as supplementary, then case weights are ignored for that object. Chapter 3. Categorical Principal Components Analysis (CATPCA) 19

24 Normalization Method. You can specify one of fie options for normalizing the object scores and the ariables. Only one normalization method can be used in each analysis. Variable Principal. This option optimizes the association between ariables. The coordinates of the ariables in the object space are the component loadings (correlations with principal components, such as dimensions and object scores). This method is useful when you are primarily interested in the correlation between the ariables. Object Principal. This option optimizes distances between objects. This method is useful when you are primarily interested in differences or similarities between the objects. Symmetrical. Use this normalization option if you are primarily interested in the relation between objects and ariables. Independent. Use this normalization option if you want to examine distances between objects and correlations between ariables separately. Custom. You can specify any real alue in the closed interal [ 1, 1]. A alue of 1 is equal to the Object Principal method. A alue of 0 is equal to the Symmetrical method. A alue of 1 is equal to the Variable Principal method. By specifying a alue greater than 1 and less than 1, you can spread the eigenalue oer both objects and ariables. This method is useful for making a tailor-made biplot or triplot. Criteria. You can specify the maximum number of iterations the procedure can go through in its computations. You can also select a conergence criterion alue. The algorithm stops iterating if the difference in total fit between the last two iterations is less than the conergence alue or if the maximum number of iterations is reached. Label Plots By. You can specify whether ariables and alue labels or ariable names and alues are used in the plots. You can also specify a maximum length for labels. Plot Dimensions. You can control the dimensions that are displayed in the output. Display all dimensions in the solution. All dimensions in the solution are displayed in a scatterplot matrix. Restrict the number of dimensions. The displayed dimensions are restricted to plotted pairs. If you restrict the dimensions, you must select the lowest and highest dimensions to be plotted. The lowest dimension can range from 1 to the number of dimensions in the solution minus 1 and is plotted against higher dimensions. The highest dimension alue can range from 2 to the number of dimensions in the solution and indicates the highest dimension to be used in plotting the dimension pairs. This specification applies to all requested multidimensional plots. Rotation. You can select a rotation method to obtain rotated results. Note: These rotation methods are not aailable if you select Perform bootstrapping in the Bootstrap dialog. Varimax. An orthogonal rotation method that minimizes the number of ariables that hae high loadings on each component. It simplifies the interpretation of the components. Quartimax. A rotation method that minimizes the number of components that are needed to explain each ariable. It simplifies the interpretation of the obsered ariables. Equamax. A rotation method that is a combination of the Varimax method, which simplifies the components, and the Quartimax method, which simplifies the ariables. The number of ariables that load highly on a component and the number of components that are needed to explain a ariable are minimized. Oblimin. A method for oblique (non-orthogonal) rotation. When delta equals 0, components are most oblique. As delta becomes more negatie, the components become less oblique. Positie alues permit additional component correlation. The alue of Delta must be less than or equal to IBM SPSS Categories 24

25 Promax. An oblique (non-orthogonal) rotation, which allows components to be correlated. It can be calculated more quickly than a direct Oblimin rotation, so it is useful for large datasets. The amount of correlation (obliqueness) that is allowed is controlled by the kappa parameter. The alue of Kappa must be greater than or equal to 1 and less 10,000. Configuration. You can read data from a file that contains the coordinates of a configuration. The first ariable in the file contains the coordinates for the first dimension. The second ariable contains the coordinates for the second dimension, and so on. Initial. The configuration in the file that is specified is used as the starting point of the analysis. Fixed. The configuration in the file that is specified is used to fit in the ariables. The ariables that are fitted in must be selected as analysis ariables, but because the configuration is fixed, they are treated as supplementary ariables (so they do not need to be selected as supplementary ariables). Categorical Principal Components Analysis Output The Output dialog box controls the display of results. Object scores. Displays the object scores and has the following options: Include Categories Of. Displays the category indicators of the analysis ariables selected. Label Object Scores By. From the list of ariables that are specified as labeling ariables, you can select one to label the objects. Component loadings. Displays the component loadings for all ariables that were not gien multiple nominal scaling leels. You can sort the component loadings by size. Iteration history. For each iteration, the ariance accounted for, loss, and increase in ariance accounted for are shown. Correlations of original ariables. Shows the correlation matrix of the original ariables and the eigenalues of that matrix. Correlations of transformed ariables. Shows the correlation matrix of the transformed (optimally scaled) ariables and the eigenalues of that matrix. Variance accounted for. Displays the amount of ariance accounted for by centroid coordinates, ector coordinates, and total (centroid and ector coordinates combined) per ariable and per dimension. Category Quantifications. Gies the category quantifications and coordinates for each dimension of the ariables that are selected. Descriptie Statistics. Displays frequencies, number of missing alues, and mode of the ariables that are selected. Categorical Principal Components Analysis Sae The controls in the Sae dialog box sae discretized data, object scores, transformed alues, and other results to the actie dataset, a new dataset in the current session, or an external file. Datasets are aailable during the current session but are not aailable in subsequent sessions unless you explicitly sae them as data files. Dataset names must adhere to ariable naming rules. Filenames or dataset names must be different for each type of data saed. If you sae object scores or transformed alues to the actie dataset, you can specify the number of multiple nominal dimensions. The options in the Bootstrap Confidence Ellipses group are only aailable if you select Perform bootstrapping in the Bootstrap dialog. Chapter 3. Categorical Principal Components Analysis (CATPCA) 21

26 Categorical Principal Components Analysis Object Plots The Object and Variable Plots dialog box allows you to specify the types of plots you want and the ariables for which plots will be produced. Object points. A plot of the object points is displayed. Objects and ariables (biplot). The object points are plotted with your choice of the ariable coordinates--component loadings or ariable centroids. Objects, loadings, and centroids (triplot). The object points are plotted with the centroids of multiple nominal-scaling-leel ariables and the component loadings of other ariables. Biplot and Triplot Variables. You can choose to use all ariables for the biplots and triplots or select a subset. Label Objects. You can choose to hae objects labeled with the categories of selected ariables (you may choose category indicator alues or alue labels in the Options dialog box) or with their case numbers. One plot is produced per ariable if Variable is selected. Categorical Principal Components Analysis Category Plots The Category Plots dialog box allows you to specify the types of plots you want and the ariables for which plots will be produced. Category Plots. For each ariable selected, a plot of the centroid and ector coordinates is plotted. For ariables with multiple nominal scaling leels, categories are in the centroids of the objects in the particular categories. For all other scaling leels, categories are on a ector through the origin. Joint Category Plots. This is a single plot of the centroid and ector coordinates of each selected ariable. Transformation Plots. Displays a plot of the optimal category quantifications ersus the category indicators. You can specify the number of dimensions for ariables with multiple nominal scaling leels; one plot will be generated for each dimension. You can also choose to display residual plots for each ariable selected. Project Centroids Of. You may choose a ariable and project its centroids onto selected ariables. Variables with multiple nominal scaling leels cannot be selected to project on. When this plot is requested, a table with the coordinates of the projected centroids is also displayed. Categorical Principal Components Analysis Loading Plots The Loading Plots dialog box controls the ariables that are included in the plot, display of centroids in the loadings plot, and display of plots of ariance accounted for. Variance accounted for. For each dimension, displays a plot of ariance accounted for. Display component loadings. If selected, a plot of the component loadings is displayed. Loading Variables. You can choose to use all ariables for the component loadings plot or select a subset. Include centroids. Variables with multiple nominal scaling leels do not hae component loadings, but you can choose to include the centroids of those ariables in the plot. You can choose to use all multiple nominal ariables or select a subset. 22 IBM SPSS Categories 24

27 Categorical Principal Components Analysis Bootstrap The Bootstrap dialog specifies parameters for bootstrap analysis. Perform bootstrapping. Performs bootstrap resampling. If plots of loadings, categories, or component scores are requested, extra plots are displayed. These plots display the points for the data sample and the bootstrap estimates. Transformation plots include confidence regions. A plot for the eigenalues is also displayed. If a two-dimensional solution is specified, confidence ellipse plots for the eigenalues, the component loadings, the category points, and the object points are displayed. Bootstrap resampling is not aailable if you specify a rotation method in the Options dialog. You can select Balanced or Unbalanced bootstrapping. Number of samples. The number of bootstrap samples that are used to calculate the bootstrap estimates. The alue must be a positie integer. Confidence leel. The confidence leel of the bootstrap estimates, expressed as a percentage. The alue must a be a positie number less than 100. Matching Method. The two alternaties are Procrustes and Reflection. Confidence Ellipses. Controls the threshold area for confidence ellipses in plots. For areas greater than (> operator) the specified alue, the number of ellipses decreases as the threshold alue increases. The settings in this group are only aailable if the number of dimensions that are specified on the main dialog is 2. Confidence ellipses for loading plots are aailable only if Display component loadings is selected in the Loading Plots dialog. Confidence ellipses for object plots are aailable only if Object points is selected in the Object and Variable Plots dialog. Confidence ellipses for category plots are aailable only if one or more ariables are specified in the Category Plots list in the Category Plots dialog. Number of ellipse contour points. The number of plot points used to draw each confidence ellipse. Larger alues produce smoother ellipses. The alue must be a positie integer less than or equal to 100. CATPCA Command Additional Features You can customize your categorical principal components analysis if you paste your selections into a syntax window and edit the resulting CATPCA command syntax. The command syntax language also allows you to: Specify rootnames for the transformed ariables, object scores, and approximations when saing them to the actie dataset (with the SAVE subcommand). Specify a maximum length for labels for each plot separately (with the PLOT subcommand). Specify a separate ariable list for residual plots (with the PLOT subcommand). See the Command Syntax Reference for complete syntax information. Chapter 3. Categorical Principal Components Analysis (CATPCA) 23

29 Chapter 4. Nonlinear Canonical Correlation Analysis (OVERALS) Nonlinear canonical correlation analysis corresponds to categorical canonical correlation analysis with optimal scaling. The purpose of this procedure is to determine how similar sets of categorical ariables are to one another. Nonlinear canonical correlation analysis is also known by the acronym OVERALS. Standard canonical correlation analysis is an extension of multiple regression, where the second set does not contain a single response ariable but instead contain multiple response ariables. The goal is to explain as much as possible of the ariance in the relationships among two sets of numerical ariables in a low dimensional space. Initially, the ariables in each set are linearly combined such that the linear combinations hae a maximal correlation. Gien these combinations, subsequent linear combinations are determined that are uncorrelated with the preious combinations and that hae the largest correlation possible. The optimal scaling approach expands the standard analysis in three crucial ways. First, OVERALS allows more than two sets of ariables. Second, ariables can be scaled as either nominal, ordinal, or numerical. As a result, nonlinear relationships between ariables can be analyzed. Finally, instead of maximizing correlations between the ariable sets, the sets are compared to an unknown compromise set that is defined by the object scores. Example. Categorical canonical correlation analysis with optimal scaling could be used to graphically display the relationship between one set of ariables containing job category and years of education and another set of ariables containing region of residence and gender. You might find that years of education and region of residence discriminate better than the remaining ariables. You might also find that years of education discriminates best on the first dimension. Statistics and plots. Frequencies, centroids, iteration history, object scores, category quantifications, weights, component loadings, single and multiple fit, object scores plots, category coordinates plots, component loadings plots, category centroids plots, transformation plots. Nonlinear Canonical Correlation Analysis Data Considerations Data. Use integers to code categorical ariables (nominal or ordinal scaling leel). To minimize output, use consecutie integers beginning with 1 to code each ariable. Variables that are scaled at the numerical leel should not be recoded to consecutie integers. To minimize output, for each ariable that is scaled at the numerical leel, subtract the smallest obsered alue from eery alue and add 1. Fractional alues are truncated after the decimal. Assumptions. Variables can be classified into two or more sets. Variables in the analysis are scaled as multiple nominal, single nominal, ordinal, or numerical. The maximum number of dimensions that are used in the procedure depends on the optimal scaling leel of the ariables. If all ariables are specified as ordinal, single nominal, or numerical, the maximum number of dimensions is the lesser of the following two alues: the number of obserations minus 1 or the total number of ariables. Howeer, if only two sets of ariables are defined, the maximum number of dimensions is the number of ariables in the smaller set. If some ariables are multiple nominal, the maximum number of dimensions is the total number of multiple nominal categories plus the number of nonmultiple nominal ariables minus the number of multiple nominal ariables. For example, if the analysis inoles fie ariables, one of which is multiple nominal with four categories, the maximum number of dimensions is ( ), or 7. If you specify a number that is greater than the maximum, the maximum alue is used. Copyright IBM Corporation 1989,

30 Related procedures. If each set contains one ariable, nonlinear canonical correlation analysis is equialent to principal components analysis with optimal scaling. If each of these ariables is multiple nominal, the analysis corresponds to multiple correspondence analysis. If two sets of ariables are inoled, and one of the sets contains only one ariable, the analysis is identical to categorical regression with optimal scaling. To Obtain a Nonlinear Canonical Correlation Analysis 1. From the menus choose: Analyze > Dimension Reduction > Optimal Scaling Select either All ariables multiple nominal or Some ariable(s) not multiple nominal. 3. Select Multiple sets. 4. Click Define. 5. Define at least two sets of ariables. Select the ariable(s) that you want to include in the first set. To moe to the next set, click Next, and select the ariables that you want to include in the second set. You can add additional sets. Click Preious to return to the preiously defined ariable set. 6. Define the alue range and measurement scale (optimal scaling leel) for each selected ariable. 7. Click OK. 8. Optionally: Select one or more ariables to proide point labels for object scores plots. Each ariable produces a separate plot, with the points labeled by the alues of that ariable. You must define a range for each of these plot label ariables. When you are using the dialog box, a single ariable cannot be used both in the analysis and as a labeling ariable. If you want to label the object scores plot with a ariable that is used in the analysis, use the Compute facility (aailable from the Transform menu) to create a copy of that ariable. Use the new ariable to label the plot. Alternatiely, command syntax can be used. Specify the number of dimensions that you want in the solution. In general, choose as few dimensions as needed to explain most of the ariation. If the analysis inoles more than two dimensions, three-dimensional plots of the first three dimensions are produced. Other dimensions can be displayed by editing the chart. Define Range and Scale You must define a range for each ariable. The maximum alue that is specified must be an integer. Fractional data alues are truncated in the analysis. A category alue that is outside of the specified range is ignored in the analysis. To minimize output, use the Automatic Recode facility (aailable from the Transform menu) to create consecutie categories beginning with 1 for ariables that are treated as nominal or ordinal. Recoding to consecutie integers is not recommended for ariables that are scaled at the numerical leel. To minimize output for ariables that are treated as numerical, for each ariable, subtract the minimum alue from eery alue and add 1. You must also select the scaling to be used to quantify each ariable. Ordinal. The order of the categories of the obsered ariable is presered in the quantified ariable. Single nominal. In the quantified ariable, objects in the same category receie the same score. Multiple nominal. The quantifications can be different for each dimension. Discrete numeric. Categories are treated as ordered and equally spaced. The differences between category numbers and the order of the categories of the obsered ariable are presered in the quantified ariable. 26 IBM SPSS Categories 24

31 Define Range You must define a range for each ariable. The maximum alue that is specified must be an integer. Fractional data alues are truncated in the analysis. A category alue that is outside of the specified range is ignored in the analysis. To minimize output, use the Automatic Recode facility (aailable from the Transform menu) to create consecutie categories beginning with 1. You must also define a range for each ariable that is used to label the object scores plots. Howeer, labels for categories with data alues that are outside of the defined range for the ariable do appear on the plots. Nonlinear Canonical Correlation Analysis Options The Options dialog box allows you to select optional statistics and plots, sae object scores as new ariables in the actie dataset, specify iteration and conergence criteria, and specify an initial configuration for the analysis. Display. Aailable statistics include marginal frequencies (counts), centroids, iteration history, weights and component loadings, category quantifications, object scores, and single and multiple fit statistics. Centroids. Category quantifications, and the projected and the actual aerages of the object scores for the objects (cases) included in each set for those cases belonging to the same category of the ariable. Weights and component loadings. The regression coefficients in each dimension for eery quantified ariable in a set, where the object scores are regressed on the quantified ariables, and the projection of the quantified ariable in the object space. Proides an indication of the contribution each ariable makes to the dimension within each set. Single and multiple fit. Measures of goodness of fit of the single- and multiple-category coordinates/category quantifications with respect to the objects. Category quantifications. Optimal scale alues assigned to the categories of a ariable. Object scores. Optimal score assigned to an object (case) in a particular dimension. Plot. You can produce plots of category coordinates, object scores, component loadings, category centroids, and transformations. Sae object scores. You can sae the object scores as new ariables in the actie dataset. Object scores are saed for the number of dimensions that are specified in the main dialog box. Use random initial configuration. A random initial configuration should be used if some or all of the ariables are single nominal. If this option is not selected, a nested initial configuration is used. Criteria. You can specify the maximum number of iterations that the nonlinear canonical correlation analysis can go through in its computations. You can also select a conergence criterion alue. The analysis stops iterating if the difference in total fit between the last two iterations is less than the conergence alue or if the maximum number of iterations is reached. OVERALS Command Additional Features You can customize your nonlinear canonical correlation analysis if you paste your selections into a syntax window and edit the resulting OVERALS command syntax. The command syntax language also allows you to: Specify the dimension pairs to be plotted, rather than plotting all extracted dimensions (using thendim keyword on the PLOT subcommand). Specify the number of alue label characters that are used to label points on the plots (with theplot subcommand). Chapter 4. Nonlinear Canonical Correlation Analysis (OVERALS) 27

32 Designate more than fie ariables as labeling ariables for object scores plots (with theplot subcommand). Select ariables that are used in the analysis as labeling ariables for the object scores plots (with the PLOT subcommand). Select ariables to proide point labels for the quantification score plot (with the PLOT subcommand). Specify the number of cases to be included in the analysis if you do not want to use all cases in the actie dataset (with the NOBSERVATIONS subcommand). Specify rootnames for ariables created by saing object scores (with the SAVE subcommand). Specify the number of dimensions to be saed, rather than saing all extracted dimensions (with the SAVE subcommand). Write category quantifications to a matrix file (using the MATRIX subcommand). Produce low-resolution plots that may be easier to read than the usual high-resolution plots (using the SET command). Produce centroid and transformation plots for specified ariables only (with the PLOT subcommand). See the Command Syntax Reference for complete syntax information. 28 IBM SPSS Categories 24

33 Chapter 5. Correspondence Analysis One of the goals of correspondence analysis is to describe the relationships between two nominal ariables in a correspondence table in a low-dimensional space, while simultaneously describing the relationships between the categories for each ariable. For each ariable, the distances between category points in a plot reflect the relationships between the categories with similar categories plotted close to each other. Projecting points for one ariable on the ector from the origin to a category point for the other ariable describe the relationship between the ariables. An analysis of contingency tables often includes examining row and column profiles and testing for independence ia the chi-square statistic. Howeer, the number of profiles can be quite large, and the chi-square test does not reeal the dependence structure. The Crosstabs procedure offers seeral measures of association and tests of association but cannot graphically represent any relationships between the ariables. Factor analysis is a standard technique for describing relationships between ariables in a low-dimensional space. Howeer, factor analysis requires interal data, and the number of obserations should be fie times the number of ariables. Correspondence analysis, on the other hand, assumes nominal ariables and can describe the relationships between categories of each ariable, as well as the relationship between the ariables. In addition, correspondence analysis can be used to analyze any table of positie correspondence measures. Example. Correspondence analysis could be used to graphically display the relationship between staff category and smoking habits. You might find that with regard to smoking, junior managers differ from secretaries, but secretaries do not differ from senior managers. You might also find that heay smoking is associated with junior managers, whereas light smoking is associated with secretaries. Statistics and plots. Correspondence measures, row and column profiles, singular alues, row and column scores, inertia, mass, row and column score confidence statistics, singular alue confidence statistics, transformation plots, row point plots, column point plots, and biplots. Correspondence Analysis Data Considerations Data. Categorical ariables to be analyzed are scaled nominally. For aggregated data or for a correspondence measure other than frequencies, use a weighting ariable with positie similarity alues. Alternatiely, for table data, use syntax to read the table. Assumptions. The maximum number of dimensions used in the procedure depends on the number of actie rows and column categories and the number of equality constraints. If no equality constraints are used and all categories are actie, the maximum dimensionality is one fewer than the number of categories for the ariable with the fewest categories. For example, if one ariable has fie categories and the other has four, the maximum number of dimensions is three. Supplementary categories are not actie. For example, if one ariable has fie categories, two of which are supplementary, and the other ariable has four categories, the maximum number of dimensions is two. Treat all sets of categories that are constrained to be equal as one category. For example, if a ariable has fie categories, three of which are constrained to be equal, that ariable should be treated as haing three categories when determining the maximum dimensionality. Two of the categories are unconstrained, and the third category corresponds to the three constrained categories. If you specify a number of dimensions greater than the maximum, the maximum alue is used. Related procedures. If more than two ariables are inoled, use multiple correspondence analysis. If the ariables should be scaled ordinally, use categorical principal components analysis. Copyright IBM Corporation 1989,

34 To Obtain a Correspondence Analysis 1. From the menus choose: Analyze > Dimension Reduction > Correspondence Analysis Select a row ariable. 3. Select a column ariable. 4. Define the ranges for the ariables. 5. Click OK. Define Row Range in Correspondence Analysis You must define a range for the row ariable. The minimum and maximum alues specified must be integers. Fractional data alues are truncated in the analysis. A category alue that is outside of the specified range is ignored in the analysis. All categories are initially unconstrained and actie. You can constrain row categories to equal other row categories, or you can define a row category as supplementary. Categories must be equal. Categories must hae equal scores. Use equality constraints if the obtained order for the categories is undesirable or counterintuitie. The maximum number of row categories that can be constrained to be equal is the total number of actie row categories minus 1. To impose different equality constraints on sets of categories, use syntax. For example, use syntax to constrain categories 1 and 2 to be equal and categories 3 and 4 to be equal. Category is supplemental. Supplementary categories do not influence the analysis but are represented in the space defined by the actie categories. Supplementary categories play no role in defining the dimensions. The maximum number of supplementary row categories is the total number of row categories minus 2. Define Column Range in Correspondence Analysis You must define a range for the column ariable. The minimum and maximum alues specified must be integers. Fractional data alues are truncated in the analysis. A category alue that is outside of the specified range is ignored in the analysis. All categories are initially unconstrained and actie. You can constrain column categories to equal other column categories, or you can define a column category as supplementary. Categories must be equal. Categories must hae equal scores. Use equality constraints if the obtained order for the categories is undesirable or counterintuitie. The maximum number of column categories that can be constrained to be equal is the total number of actie column categories minus 1. To impose different equality constraints on sets of categories, use syntax. For example, use syntax to constrain categories 1 and 2 to be equal and categories 3 and 4 to be equal. Category is supplemental. Supplementary categories do not influence the analysis but are represented in the space defined by the actie categories. Supplementary categories play no role in defining the dimensions. The maximum number of supplementary column categories is the total number of column categories minus 2. Correspondence Analysis Model The Model dialog box allows you to specify the number of dimensions, the distance measure, the standardization method, and the normalization method. Dimensions in solution. Specify the number of dimensions. In general, choose as few dimensions as needed to explain most of the ariation. The maximum number of dimensions depends on the number of actie categories used in the analysis and on the equality constraints. The maximum number of dimensions is the smaller of: 30 IBM SPSS Categories 24

35 The number of actie row categories minus the number of row categories constrained to be equal, plus the number of constrained row category sets. The number of actie column categories minus the number of column categories constrained to be equal, plus the number of constrained column category sets. Distance Measure. You can select the measure of distance among the rows and columns of the correspondence table. Choose one of the following alternaties: Chi-square. Use a weighted profile distance, where the weight is the mass of the rows or columns. This measure is required for standard correspondence analysis. Euclidean. Use the square root of the sum of squared differences between pairs of rows and pairs of columns. Standardization Method. Choose one of the following alternaties: Row and column means are remoed. Both the rows and columns are centered. This method is required for standard correspondence analysis. Row means are remoed. Only the rows are centered. Column means are remoed. Only the columns are centered. Row totals are equalized and means are remoed. Before centering the rows, the row margins are equalized. Column totals are equalized and means are remoed. Before centering the columns, the column margins are equalized. Normalization Method. Choose one of the following alternaties: Symmetrical. For each dimension, the row scores are the weighted aerage of the column scores diided by the matching singular alue, and the column scores are the weighted aerage of row scores diided by the matching singular alue. Use this method if you want to examine the differences or similarities between the categories of the two ariables. Principal. The distances between row points and column points are approximations of the distances in the correspondence table according to the selected distance measure. Use this method if you want to examine differences between categories of either or both ariables instead of differences between the two ariables. Row principal. The distances between row points are approximations of the distances in the correspondence table according to the selected distance measure. The row scores are the weighted aerage of the column scores. Use this method if you want to examine differences or similarities between categories of the row ariable. Column principal. The distances between column points are approximations of the distances in the correspondence table according to the selected distance measure. The column scores are the weighted aerage of the row scores. Use this method if you want to examine differences or similarities between categories of the column ariable. Custom. You must specify a alue between 1 and 1. A alue of 1 corresponds to column principal. A alue of 1 corresponds to row principal. A alue of 0 corresponds to symmetrical. All other alues spread the inertia oer both the row and column scores to arying degrees. This method is useful for making tailor-made biplots. Correspondence Analysis Statistics The Statistics dialog box allows you to specify the numerical output produced. Correspondence table. A crosstabulation of the input ariables with row and column marginal totals. Oeriew of row points. For each row category, the scores, mass, inertia, contribution to the inertia of the dimension, and the contribution of the dimension to the inertia of the point. Chapter 5. Correspondence Analysis 31

36 Oeriew of column points. For each column category, the scores, mass, inertia, contribution to the inertia of the dimension, and the contribution of the dimension to the inertia of the point. Row profiles. For each row category, the distribution across the categories of the column ariable. Column profiles. For each column category, the distribution across the categories of the row ariable. Permutations of the correspondence table. The correspondence table reorganized such that the rows and columns are in increasing order according to the scores on the first dimension. Optionally, you can specify the maximum dimension number for which permuted tables will be produced. A permuted table for each dimension from 1 to the number specified is produced. Confidence Statistics for Row points. Includes standard deiation and correlations for all nonsupplementary row points. Confidence Statistics for Column points. Includes standard deiation and correlations for all nonsupplementary column points. Correspondence Analysis Plots The Plots dialog box allows you to specify which plots are produced. Scatterplots. Produces a matrix of all pairwise plots of the dimensions. Aailable scatterplots include: Biplot. Produces a matrix of joint plots of the row and column points. If principal normalization is selected, the biplot is not aailable. Row points. Produces a matrix of plots of the row points. Column points. Produces a matrix of plots of the column points. Optionally, you can specify how many alue label characters to use when labeling the points. This alue must be a non-negatie integer less than or equal to 20. Line Plots. Produces a plot for eery dimension of the selected ariable. Aailable line plots include: Transformed row categories. Produces a plot of the original row category alues against their corresponding row scores. Transformed column categories. Produces a plot of the original column category alues against their corresponding column scores. Optionally, you can specify how many alue label characters to use when labeling the category axis. This alue must be a non-negatie integer less than or equal to 20. Plot Dimensions. Allows you to control the dimensions displayed in the output. Display all dimensions in the solution. All dimensions in the solution are displayed in a scatterplot matrix. Restrict the number of dimensions. The displayed dimensions are restricted to plotted pairs. If you restrict the dimensions, you must select the lowest and highest dimensions to be plotted. The lowest dimension can range from 1 to the number of dimensions in the solution minus 1, and is plotted against higher dimensions. The highest dimension alue can range from 2 to the number of dimensions in the solution, and indicates the highest dimension to be used in plotting the dimension pairs. This specification applies to all requested multidimensional plots. 32 IBM SPSS Categories 24

37 CORRESPONDENCE Command Additional Features You can customize your correspondence analysis if you paste your selections into a syntax window and edit the resulting CORRESPONDENCE command syntax. The command syntax language also allows you to: Specify table data as input instead of using casewise data (using the TABLE = ALL subcommand). Specify the number of alue-label characters used to label points for each type of scatterplot matrix or biplot matrix (with the PLOT subcommand). Specify the number of alue-label characters used to label points for each type of line plot (with the PLOT subcommand). Write a matrix of row and column scores to a matrix data file (with the OUTFILE subcommand). Write a matrix of confidence statistics (ariances and coariances) for the singular alues and the scores to a matrix data file (with the OUTFILE subcommand). Specify multiple sets of categories to be equal (with the EQUAL subcommand). See the Command Syntax Reference for complete syntax information. Chapter 5. Correspondence Analysis 33

39 Chapter 6. Multiple Correspondence Analysis Multiple Correspondence Analysis quantifies nominal (categorical) data by assigning numerical alues to the cases (objects) and categories so that objects within the same category are close together and objects in different categories are far apart. Each object is as close as possible to the category points of categories that apply to the object. In this way, the categories diide the objects into homogeneous subgroups. Variables are considered homogeneous when they classify objects in the same categories into the same subgroups. Example. Multiple Correspondence Analysis could be used to graphically display the relationship between job category, minority classification, and gender. You might find that minority classification and gender discriminate between people but that job category does not. You might also find that the Latino and African-American categories are similar to each other. Statistics and plots. Object scores, discrimination measures, iteration history, correlations of original and transformed ariables, category quantifications, descriptie statistics, object points plots, biplots, category plots, joint category plots, transformation plots, and discrimination measures plots. Multiple Correspondence Analysis Data Considerations Data. String ariable alues are always conerted into positie integers by ascending alphanumeric order. User-defined missing alues, system-missing alues, and alues less than 1 are considered missing; you can recode or add a constant to ariables with alues less than 1 to make them nonmissing. Assumptions. All ariables hae the multiple nominal scaling leel. The data must contain at least three alid cases. The analysis is based on positie integer data. The discretization option will automatically categorize a fractional-alued ariable by grouping its alues into categories with a close-to-normal distribution and will automatically conert alues of string ariables into positie integers. You can specify other discretization schemes. Related procedures. For two ariables, Multiple Correspondence Analysis is analogous to Correspondence Analysis. If you beliee that ariables possess ordinal or numerical properties, Categorical Principal Components Analysis should be used. If sets of ariables are of interest, Nonlinear Canonical Correlation Analysis should be used. To Obtain a Multiple Correspondence Analysis 1. From the menus choose: Analyze > Dimension Reduction > Optimal Scaling Select All ariables multiple nominal. 3. Select One set. 4. Click Define. 5. Select at least two analysis ariables and specify the number of dimensions in the solution. 6. Click OK. You may optionally specify supplementary ariables, which are fitted into the solution found, or labeling ariables for the plots. Define Variable Weight in Multiple Correspondence Analysis You can set the weight for analysis ariables. Copyright IBM Corporation 1989,

40 Variable weight. You can choose to define a weight for each ariable. The alue specified must be a positie integer. The default alue is 1. Multiple Correspondence Analysis Discretization The Discretization dialog box allows you to select a method of recoding your ariables. Fractional-alued ariables are grouped into seen categories (or into the number of distinct alues of the ariable if this number is less than seen) with an approximately normal distribution unless otherwise specified. String ariables are always conerted into positie integers by assigning category indicators according to ascending alphanumeric order. Discretization for string ariables applies to these integers. Other ariables are left alone by default. The discretized ariables are then used in the analysis. Method. Choose between grouping, ranking, and multiplying. Grouping. Recode into a specified number of categories or recode by interal. Ranking. The ariable is discretized by ranking the cases. Multiplying. The current alues of the ariable are standardized, multiplied by 10, rounded, and hae a constant added so that the lowest discretized alue is 1. Grouping. The following options are aailable when discretizing ariables by grouping: Number of categories. Specify a number of categories and whether the alues of the ariable should follow an approximately normal or uniform distribution across those categories. Equal interals. Variables are recoded into categories defined by these equally sized interals. You must specify the length of the interals. Multiple Correspondence Analysis Missing Values The Missing Values dialog box allows you to choose the strategy for handling missing alues in analysis ariables and supplementary ariables. Missing Value Strategy. Choose to exclude missing alues (passie treatment), impute missing alues (actie treatment), or exclude objects with missing alues (listwise deletion). Exclude missing alues; for correlations impute after quantification. Objects with missing alues on the selected ariable do not contribute to the analysis for this ariable. If all ariables are gien passie treatment, then objects with missing alues on all ariables are treated as supplementary. If correlations are specified in the Output dialog box, then (after analysis) missing alues are imputed with the most frequent category, or mode, of the ariable for the correlations of the original ariables. For the correlations of the optimally scaled ariables, you can choose the method of imputation. Select Mode to replace missing alues with the mode of the optimally scaled ariable. Select Extra category to replace missing alues with the quantification of an extra category. This implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Impute missing alues. Objects with missing alues on the selected ariable hae those alues imputed. You can choose the method of imputation. Select Mode to replace missing alues with the most frequent category. When there are multiple modes, the one with the smallest category indicator is used. Select Extra category to replace missing alues with the same quantification of an extra category. This implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Exclude objects with missing alues on this ariable. Objects with missing alues on the selected ariable are excluded from the analysis. This strategy is not aailable for supplementary ariables. Multiple Correspondence Analysis Options The Options dialog box allows you to select the initial configuration, specify iteration and conergence criteria, select a normalization method, choose the method for labeling plots, and specify supplementary objects. 36 IBM SPSS Categories 24

41 Supplementary Objects. Specify the case number of the object (or the first and last case numbers of a range of objects) that you want to make supplementary, and then click Add. Continue until you hae specified all of your supplementary objects. If an object is specified as supplementary, then case weights are ignored for that object. Normalization Method. You can specify one of fie options for normalizing the object scores and the ariables. Only one normalization method can be used in a gien analysis. Variable Principal. This option optimizes the association between ariables. The coordinates of the ariables in the object space are the component loadings (correlations with principal components, such as dimensions and object scores). This is useful when you are interested primarily in the correlation between the ariables. Object Principal. This option optimizes distances between objects. This is useful when you are interested primarily in differences or similarities between the objects. Symmetrical. Use this normalization option if you are interested primarily in the relation between objects and ariables. Independent. Use this normalization option if you want to examine distances between objects and correlations between ariables separately. Custom. You can specify any real alue in the closed interal [ 1, 1]. A alue of 1 is equal to the Object Principal method, a alue of 0 is equal to the Symmetrical method, and a alue of 1 is equal to the Variable Principal method. By specifying a alue greater than 1 and less than 1, you can spread the eigenalue oer both objects and ariables. This method is useful for making a tailor-made biplot or triplot. Criteria. You can specify the maximum number of iterations the procedure can go through in its computations. You can also select a conergence criterion alue. The algorithm stops iterating if the difference in total fit between the last two iterations is less than the conergence alue or if the maximum number of iterations is reached. Label Plots By. Allows you to specify whether ariables and alue labels or ariable names and alues will be used in the plots. You can also specify a maximum length for labels. Plot Dimensions. Allows you to control the dimensions displayed in the output. Display all dimensions in the solution. All dimensions in the solution are displayed in a scatterplot matrix. Restrict the number of dimensions. The displayed dimensions are restricted to plotted pairs. If you restrict the dimensions, you must select the lowest and highest dimensions to be plotted. The lowest dimension can range from 1 to the number of dimensions in the solution minus 1 and is plotted against higher dimensions. The highest dimension alue can range from 2 to the number of dimensions in the solution and indicates the highest dimension to be used in plotting the dimension pairs. This specification applies to all requested multidimensional plots. Configuration. You can read data from a file containing the coordinates of a configuration. The first ariable in the file should contain the coordinates for the first dimension, the second ariable should contain the coordinates for the second dimension, and so on. Initial. The configuration in the file specified will be used as the starting point of the analysis. Fixed. The configuration in the file specified will be used to fit in the ariables. The ariables that are fitted in must be selected as analysis ariables, but, because the configuration is fixed, they are treated as supplementary ariables (so they do not need to be selected as supplementary ariables). Multiple Correspondence Analysis Output The Output dialog box allows you to produce tables for object scores, discrimination measures, iteration history, correlations of original and transformed ariables, category quantifications for selected ariables, and descriptie statistics for selected ariables. Chapter 6. Multiple Correspondence Analysis 37

42 Object scores. Displays the object scores, including mass, inertia, and contributions, and has the following options: Include Categories Of. Displays the category indicators of the analysis ariables selected. Label Object Scores By. From the list of ariables specified as labeling ariables, you can select one to label the objects. Discrimination measures. Displays the discrimination measures per ariable and per dimension. Iteration history. For each iteration, the ariance accounted for, loss, and increase in ariance accounted for are shown. Correlations of original ariables. Shows the correlation matrix of the original ariables and the eigenalues of that matrix. Correlations of transformed ariables. Shows the correlation matrix of the transformed (optimally scaled) ariables and the eigenalues of that matrix. Category Quantifications and Contributions. Gies the category quantifications (coordinates), including mass, inertia, and contributions, for each dimension of the ariable(s) selected. Note: the coordinates and contributions (including the mass and inertia) are displayed in separate layers of the piot table output, with the coordinates shown by default. To display the contributions, actiate (double-click) on the table and select Contributions from the Layer dropdown list. Descriptie Statistics. Displays frequencies, number of missing alues, and mode of the ariable(s) selected. Multiple Correspondence Analysis Sae The Sae dialog box allows you to sae discretized data, object scores, and transformed alues to an external IBM SPSS Statistics data file or dataset in the current session. You can also sae transformed alues and object scores to the actie dataset. Datasets are aailable during the current session but are not aailable in subsequent sessions unless you explicitly sae them as data files. Dataset names must adhere to ariable naming rules. Filenames or dataset names must be different for each type of data saed. If you sae object scores or transformed alues to the actie dataset, you can specify the number of multiple nominal dimensions. Multiple Correspondence Analysis Object Plots The Object Plots dialog box allows you to specify the types of plots you want and the ariables to be plotted Object points. A plot of the object points is displayed. Objects and centroids (biplot). The object points are plotted with the ariable centroids. Biplot Variables. You can choose to use all ariables for the biplots or select a subset. Label Objects. You can choose to hae objects labeled with the categories of selected ariables (you may choose category indicator alues or alue labels in the Options dialog box) or with their case numbers. One plot is produced per ariable if Variable is selected. 38 IBM SPSS Categories 24

43 Multiple Correspondence Analysis Variable Plots The Variable Plots dialog box allows you to specify the types of plots you want and the ariables to be plotted. Category Plots. For each ariable selected, a plot of the centroid coordinates is plotted. Categories are in the centroids of the objects in the particular categories. Joint Category Plots. This is a single plot of the centroid coordinates of each selected ariable. Transformation Plots. Displays a plot of the optimal category quantifications ersus the category indicators. You can specify the number of dimensions; one plot will be generated for each dimension. You can also choose to display residual plots for each ariable selected. Discrimination Measures. Produces a single plot of the discrimination measures for the selected ariables. MULTIPLE CORRESPONDENCE Command Additional Features You can customize your Multiple Correspondence Analysis if you paste your selections into a syntax window and edit the resulting MULTIPLE CORRESPONDENCE command syntax. The command syntax language also allows you to: Specify rootnames for the transformed ariables, object scores, and approximations when saing them to the actie dataset (with the SAVE subcommand). Specify a maximum length for labels for each plot separately (with the PLOT subcommand). Specify a separate ariable list for residual plots (with the PLOT subcommand). See the Command Syntax Reference for complete syntax information. Chapter 6. Multiple Correspondence Analysis 39

45 Chapter 7. Multidimensional Scaling (PROXSCAL) Multidimensional scaling attempts to find the structure in a set of proximity measures between objects. This process is accomplished by assigning obserations to specific locations in a conceptual low-dimensional space such that the distances between points in the space match the gien (dis)similarities as closely as possible. The result is a least-squares representation of the objects in that low-dimensional space, which, in many cases, will help you to further understand your data. Example. Multidimensional scaling can be ery useful in determining perceptual relationships. For example, when considering your product image, you can conduct a surey to obtain a dataset that describes the perceied similarity (or proximity) of your product to those of your competitors. Using these proximities and independent ariables (such as price), you can try to determine which ariables are important to how people iew these products, and you can adjust your image accordingly. Statistics and plots. Iteration history, stress measures, stress decomposition, coordinates of the common space, object distances within the final configuration, indiidual space weights, indiidual spaces, transformed proximities, transformed independent ariables, stress plots, common space scatterplots, indiidual space weight scatterplots, indiidual spaces scatterplots, transformation plots, Shepard residual plots, and independent ariables transformation plots. Multidimensional Scaling Data Considerations Data. Data can be supplied in the form of proximity matrices or ariables that are conerted into proximity matrices. The matrices can be formatted in columns or across columns. The proximities can be treated on the ratio, interal, ordinal, or spline scaling leels. Assumptions. At least three ariables must be specified. The number of dimensions cannot exceed the number of objects minus one. Dimensionality reduction is omitted if combined with multiple random starts. If only one source is specified, all models are equialent to the identity model; therefore, the analysis defaults to the identity model. Related procedures. Scaling all ariables at the numerical leel corresponds to standard multidimensional scaling analysis. To Obtain a Multidimensional Scaling 1. From the menus choose: Analyze > Scale > Multidimensional Scaling (PROXSCAL)... This opens the Data Format dialog box. 2. Specify the format of your data: Data Format. Specify whether your data consist of proximity measures or you want to create proximities from the data. Number of Sources. If your data are proximities, specify whether you hae a single source or multiple sources of proximity measures. One Source. If there is one source of proximities, specify whether your dataset is formatted with the proximities in a matrix across the columns or in a single column with two separate ariables to identify the row and column of each proximity. The proximities are in a matrix across columns. The proximity matrix is spread across a number of columns equal to the number of objects. This leads to the Proximities in Matrices across Columns dialog box. Copyright IBM Corporation 1989,

46 The proximities are in a single column. The proximity matrix is collapsed into a single column, or ariable. Two additional ariables, identifying the row and column for each cell, are necessary. This leads to the Proximities in One Column dialog box. Multiple Sources. If there are multiple sources of proximities, specify whether the dataset is formatted with the proximities in stacked matrices across columns, in multiple columns with one source per column, or in a single column. The proximities are in stacked matrices across columns. The proximity matrices are spread across a number of columns equal to the number of objects and are stacked aboe one another across a number of rows equal to the number of objects times the number of sources. This leads to the Proximities in Matrices across Columns dialog box. The proximities are in columns, one source per column. The proximity matrices are collapsed into multiple columns, or ariables. Two additional ariables, identifying the row and column for each cell, are necessary. This leads to the Proximities in Columns dialog box. The proximites are stacked in a single column. The proximity matrices are collapsed into a single column, or ariable. Three additional ariables, identifying the row, column, and source for each cell, are necessary. This leads to the Proximities in One Column dialog box. 3. Click Define. Proximities in Matrices across Columns If you select the proximities in matrices data model for either one source or multiple sources in the Data Format dialog box, then do the following: 1. Select three or more proximities ariables. (Be sure that the order of the ariables in the list matches the order of the columns of the proximities.) 2. Optionally, select a number of weights ariables equal to the number of proximities ariables. (Be sure that the order of the weights matches the order of the proximities that they weight.) 3. Optionally, if there are multiple sources, select a sources ariable. (The number of cases in each proximities ariable should equal the number of proximities ariables times the number of sources.) Additionally, you can define a model for the multidimensional scaling, place restrictions on the common space, set conergence criteria, specify the initial configuration to be used, and choose plots and output. Proximities in Columns If you select the multiple columns model for multiple sources in the Data Format dialog box, then do the following: 1. Select two or more proximities ariables. (Each ariable is assumed to be a matrix of proximities from a separate source.) 2. Select a rows ariable to define the row locations for the proximities in each proximities ariable. 3. Select a columns ariable to define the column locations for the proximities in each proximities ariable. (Cells of the proximity matrix that are not gien a row/column designation are treated as missing.) 4. Optionally, select a number of weights ariables equal to the number of proximities ariables. Additionally, you can define a model for the multidimensional scaling, place restrictions on the common space, set conergence criteria, specify the initial configuration to be used, and choose plots and output. 42 IBM SPSS Categories 24

47 Proximities in One Column If you select the one column model for either one source or multiple sources in the Data Format dialog box, then do the following: 1. Select a proximities ariable. (t is assumed to be one or more matrices of proximities.) 2. Select a rows ariable to define the row locations for the proximities in the proximities ariable. 3. Select a columns ariable to define the column locations for the proximities in the proximities ariable. 4. If there are multiple sources, select a sources ariable. (For each source, cells of the proximity matrix that are not gien a row/column designation are treated as missing.) 5. Optionally, select a weights ariable. Additionally, you can define a model for the multidimensional scaling, place restrictions on the common space, set conergence criteria, specify the initial configuration to be used, and choose plots and output. Create Proximities from Data If you choose to create proximities from the data in the Data Format dialog box, then do the following: 1. If you create distances between ariables (see the Create Measure from Data dialog box), select at least three ariables. These ariables will be used to create the proximity matrix (or matrices, if there are multiple sources). If you create distances between cases, only one ariable is needed. 2. If there are multiple sources, select a sources ariable. 3. Optionally, choose a measure for creating proximities. Additionally, you can define a model for the multidimensional scaling, place restrictions on the common space, set conergence criteria, specify the initial configuration to be used, and choose plots and output. Create Measure from Data Multidimensional scaling uses dissimilarity data to create a scaling solution. If your data are multiariate data (alues of measured ariables), you must create dissimilarity data in order to compute a multidimensional scaling solution. You can specify the details of creating dissimilarity measures from your data. Measure. Allows you to specify the dissimilarity measure for your analysis. Select one alternatie from the Measure group corresponding to your type of data, and then select one of the measures from the drop-down list corresponding to that type of measure. Aailable alternaties are: Interal. Euclidean distance, Squared Euclidean distance, Chebyche, Block, Minkowski, or Customized. Counts. Chi-square measure or Phi-square measure. Binary. Euclidean distance, Squared Euclidean distance, Size difference, Pattern difference, Variance, or Lance and Williams. Create Distance Matrix. Allows you to choose the unit of analysis. Alternaties are Between ariables or Between cases. Transform Values. In certain cases, such as when ariables are measured on ery different scales, you want to standardize alues before computing proximities (not applicable to binary data). Select a standardization method from the Standardize drop-down list (if no standardization is required, select None). Chapter 7. Multidimensional Scaling (PROXSCAL) 43

48 Define a Multidimensional Scaling Model The Model dialog box allows you to specify a scaling model, its minimum and maximum number of dimensions, the structure of the proximity matrix, the transformation to use on the proximities, and whether proximities are transformed within each source separately or unconditionally on the source. Scaling Model. Choose from the following alternaties: Identity. All sources hae the same configuration. Weighted Euclidean. This model is an indiidual differences model. Each source has an indiidual space in which eery dimension of the common space is weighted differentially. Generalized Euclidean. This model is an indiidual differences model. Each source has an indiidual space that is equal to a rotation of the common space, followed by a differential weighting of the dimensions. Reduced rank. This model is a generalized Euclidean model for which you can specify the rank of the indiidual space. You must specify a rank that is greater than or equal to 1 and less than the maximum number of dimensions. Shape. Specify whether the proximities should be taken from the lower-triangular part or the upper-triangular part of the proximity matrix. You can specify that the full matrix be used, in which case the weighted sum of the upper-triangular part and the lower-triangular part will be analyzed. In any case, the complete matrix should be specified, including the diagonal, though only the specified parts will be used. Proximities. Specify whether your proximity matrix contains measures of similarity or dissimilarity. Proximity Transformations. Choose from the following alternaties: Ratio. The transformed proximities are proportional to the original proximities. This is allowed only for positiely alued proximities. Interal. The transformed proximities are proportional to the original proximities, plus an intercept term. The intercept assures all transformed proximities to be positie. Ordinal. The transformed proximities hae the same order as the original proximities. You specify whether tied proximities should be kept tied or allowed to become untied. Spline. The transformed proximities are a smooth nondecreasing piecewise polynomial transformation of the original proximities. You specify the degree of the polynomial and the number of interior knots. Apply Transformations. Specify whether only proximities within each source are compared with each other or whether the comparisons are unconditional on the source. Dimensions. By default, a solution is computed in two dimensions (Minimum = 2, Maximum = 2). You choose an integer minimum and maximum from 1 to the number of objects minus 1 (as long as the minimum is less than or equal to the maximum). The procedure computes a solution in the maximum dimensions and then reduces the dimensionality in steps until the lowest is reached. Multidimensional Scaling Restrictions The Restrictions dialog box allows you to place restrictions on the common space. Restrictions on Common Space. Specify the type of restriction you want. No restrictions. No restrictions are placed on the common space. Some coordinates fixed. The first ariable selected contains the coordinates of the objects on the first dimension, the second ariable corresponds to coordinates on the second dimension, and so on. A missing alue indicates that a coordinate on a dimension is free. The number of ariables selected must equal the maximum number of dimensions requested. 44 IBM SPSS Categories 24

49 Linear combination of independent ariables. The common space is restricted to be a linear combination of the ariables selected. Restriction Variables. Select the ariables that define the restrictions on the common space. If you specified a linear combination, you specify an interal, nominal, ordinal, or spline transformation for the restriction ariables. In either case, the number of cases for each ariable must equal the number of objects. Multidimensional Scaling Options The Options dialog box allows you to select the initial configuration style, specify iteration and conergence criteria, and select standard or relaxed updates. Initial Configuration. Choose one of the following alternaties: Simplex. Objects are placed at the same distance from each other in the maximum dimension. One iteration is taken to improe this high-dimensional configuration, followed by a dimension reduction operation to obtain an initial configuration that has the maximum number of dimensions that you specified in the Model dialog box. Torgerson. A classical scaling solution is used as the initial configuration. Single random start. A configuration is chosen at random. Multiple random starts. Seeral configurations are chosen at random, and the configuration with the lowest normalized raw stress is used as the initial configuration. Custom. You select ariables that contain the coordinates of your own initial configuration. The number of ariables selected should equal the maximum number of dimensions specified, with the first ariable corresponding to coordinates on dimension 1, the second ariable corresponding to coordinates on dimension 2, and so on. The number of cases in each ariable should equal the number of objects. Iteration Criteria. Specify the iteration criteria alues. Stress conergence. The algorithm will stop iterating when the difference in consecutie normalized raw stress alues is less than the number that is specified here, which must lie between 0.0 and 1.0. Minimum stress. The algorithm will stop when the normalized raw stress falls below the number that is specified here, which must lie between 0.0 and 1.0. Maximum iterations. The algorithm will perform the number of specified iterations, unless one of the aboe criteria is satisfied first. Use relaxed updates. Relaxed updates will speed up the algorithm; these updates cannot be used with models other than the identity model or used with restrictions. Multidimensional Scaling Plots, Version 1 The Plots dialog box allows you to specify which plots will be produced. This topic describes the Plots dialog box if you hae the Proximities in Columns data format. For Indiidual space weights, Original s. transformed proximities, and Transformed proximities s. distances plots, you specify the sources for which the plots should be produced. The list of aailable sources is the list of proximities ariables in the main dialog box. Stress. A plot is produced of normalized raw stress ersus dimensions. This plot is produced only if the maximum number of dimensions is larger than the minimum number of dimensions. Common space. A scatterplot matrix of coordinates of the common space is displayed. Indiidual spaces. For each source, the coordinates of the indiidual spaces are displayed in scatterplot matrices. This is possible only if one of the indiidual differences models is specified in the Model dialog box. Chapter 7. Multidimensional Scaling (PROXSCAL) 45

50 Indiidual space weights. A scatterplot is produced of the indiidual space weights. This is possible only if one of the indiidual differences models is specified in the Model dialog box. For the weighted Euclidean model, the weights are printed in plots, with one dimension on each axis. For the generalized Euclidean model, one plot is produced per dimension, indicating both rotation and weighting of that dimension. The reduced rank model produces the same plot as the generalized Euclidean model but reduces the number of dimensions for the indiidual spaces. Original s. transformed proximities. Plots are produced of the original proximities ersus the transformed proximities. Transformed proximities s. distances. The transformed proximities ersus the distances are plotted. Transformed independent ariables. Transformation plots are produced for the independent ariables. Variable and dimension correlations. A plot of correlations between the independent ariables and the dimensions of the common space is displayed. Multidimensional Scaling Plots, Version 2 The Plots dialog box allows you to specify which plots will be produced. This topic describes the Plots dialog box if your data format is anything other than Proximities in Columns. For Indiidual space weights, Original s. transformed proximities, and Transformed proximities s. distances plots, you specify the sources for which the plots should be produced. The source numbers entered must be alues of the sources ariable that is specified in the main dialog box and must range from 1 to the number of sources. Multidimensional Scaling Output The Output dialog box allows you to control the amount of displayed output and sae some of it to separate files. Display. Select one or more of the following items for display: Common space coordinates. Displays the coordinates of the common space. Indiidual space coordinates. The coordinates of the indiidual spaces are displayed only if the model is not the identity model. Indiidual space weights. Displays the indiidual space weights only if one of the indiidual differences models is specified. Depending on the model, the space weights are decomposed in rotation weights and dimension weights, which are also displayed. Distances. Displays the distances between the objects in the configuration. Transformed proximities. Displays the transformed proximities between the objects in the configuration. Input data. Includes the original proximities and, if present, the data weights, the initial configuration, and the fixed coordinates of the independent ariables. Stress for random starts. Displays the random number seed and normalized raw stress alue of each random start. Iteration history. Displays the history of iterations of the main algorithm. Multiple stress measures. Displays different stress alues. The table contains alues for normalized raw stress, Stress-I, Stress-II, S-Stress, Dispersion Accounted For (DAF), and Tucker's Coefficient of Congruence. Stress decomposition. Displays an objects and sources decomposition of final normalized raw stress, including the aerage per object and the aerage per source. Transformed independent ariables. If a linear combination restriction was selected, the transformed independent ariables and the corresponding regression weights are displayed. 46 IBM SPSS Categories 24

51 Variable and dimension correlations. If a linear combination restriction was selected, the correlations between the independent ariables and the dimensions of the common space are displayed. Sae to New File. You can sae the common space coordinates, indiidual space weights, distances, transformed proximities, and transformed independent ariables to separate IBM SPSS Statistics data files. PROXSCAL Command Additional Features You can customize your multidimensional scaling of proximities analysis if you paste your selections into a syntax window and edit the resulting PROXSCAL command syntax. The command syntax language also allows you to: Specify separate ariable lists for transformations and residuals plots (with the PLOT subcommand). Specify separate source lists for indiidual space weights, transformations, and residuals plots (with the PLOT subcommand). Specify a subset of the independent ariables transformation plots to be displayed (with the PLOT subcommand). See the Command Syntax Reference for complete syntax information. Chapter 7. Multidimensional Scaling (PROXSCAL) 47

53 Chapter 8. Multidimensional Unfolding (PREFSCAL) The Multidimensional Unfolding procedure attempts to find a common quantitatie scale that allows you to isually examine the relationships between two sets of objects. Examples. You hae asked 21 indiiduals to rank 15 breakfast items in order of preference, 1 to 15. Using Multidimensional Unfolding, you can determine that the indiiduals discriminate between breakfast items in two primary ways: between soft and hard breads, and between fattening and non-fattening items. Alternatiely, you hae asked a group of driers to rate 26 models of cars on 10 attributes on a 6-point scale ranging from 1="not true at all" to 6="ery true." Aeraged oer indiiduals, the alues are taken as similarities. Using Multidimensional Unfolding, you find clusterings of similar models and the attributes with which they are most closely associated. Statistics and plots. The Multidimensional Unfolding procedure can produce an iteration history, stress measures, stress decomposition, coordinates of the common space, object distances within the final configuration, indiidual space weights, indiidual spaces, transformed proximities, stress plots, common space scatterplots, indiidual space weight scatterplots, indiidual spaces scatterplots, transformation plots, and Shepard residual plots. Multidimensional Unfolding Data Considerations Data. Data are supplied in the form of rectangular proximity matrices. Each column is considered a separate column object. Each row of a proximity matrix is considered a separate row object. When there are multiple sources of proximities, the matrices are stacked. Assumptions. At least two ariables must be specified. The number of dimensions in the solution may not exceed the number of objects minus one. If only one source is specified, all models are equialent to the identity model; therefore, the analysis defaults to the identity model. To Obtain a Multidimensional Unfolding 1. From the menus choose: Analyze > Scale > Multidimensional Unfolding (PREFSCAL) Select two or more ariables that identify the columns in the rectangular proximity matrix. Each ariable represents a separate column object. 3. Optionally, select a number of weights ariables equal to the number of column object ariables. The order of the weights ariables should match the order of the column objects they weight. 4. Optionally, select a rows ariable. The alues (or alue labels) of this ariable are used to label row objects in the output. 5. If there are multiple sources, optionally select a sources ariable. The number of cases in the data file should equal the number of row objects times the number of sources. Additionally, you can define a model for the multidimensional unfolding, place restrictions on the common space, set conergence criteria, specify the initial configuration to be used, and choose plots and output. Copyright IBM Corporation 1989,

54 Define a Multidimensional Unfolding Model The Model dialog box allows you to specify a scaling model, its minimum and maximum number of dimensions, the structure of the proximity matrix, the transformation to use on the proximities, and whether proximities are transformed conditonal upon the row, conditional upon the source, or unconditionally on the source. Scaling Model. Choose from the following alternaties: Identity. All sources hae the same configuration. Weighted Euclidean. This model is an indiidual differences model. Each source has an indiidual space in which eery dimension of the common space is weighted differentially. Generalized Euclidean. This model is an indiidual differences model. Each source has an indiidual space that is equal to a rotation of the common space, followed by a differential weighting of the dimensions. Proximities. Specify whether your proximity matrix contains measures of similarity or dissimilarity. Dimensions. By default, a solution is computed in two dimensions (Minimum = 2, Maximum = 2). You can choose an integer minimum and maximum from 1 to the number of objects minus 1 as long as the minimum is less than or equal to the maximum. The procedure computes a solution in the maximum dimensionality and then reduces the dimensionality in steps until the lowest is reached. Proximity Transformations. Choose from the following alternaties: None. The proximities are not transformed. You can optionally select Include intercept, in which case the proximities can be shifted by a constant term. Linear. The transformed proximities are proportional to the original proximities; that is, the transformation function estimates a slope and the intercept is fixed at 0. This is also called a ratio transformation. You can optionally select Include intercept, in which case the proximities can also be shifted by a constant term. This is also called an interal transformation. Spline. The transformed proximities are a smooth nondecreasing piecewise polynomial transformation of the original proximities. You can specify the degree of the polynomial and the number of interior knots. You can optionally select Include intercept, in which case the proximities can also be shifted by a constant term. Smooth. The transformed proximities hae the same order as the original proximities, including a restriction that takes the differences between subsequent alues into account. The result is a "smooth ordinal" transformation. You can specify whether tied proximities should be kept tied or allowed to become untied. Ordinal. The transformed proximities hae the same order as the original proximities. You can specify whether tied proximities should be kept tied or allowed to become untied. Apply Transformations. Specify whether only proximities within each row are compared with each other, or only proximities within each source are compared with each other, or the comparisons are unconditional on the row or source; that is, whether the transformations are performed per row, per source, or oer all proximities at once. Multidimensional Unfolding Restrictions The Restrictions dialog box allows you to place restrictions on the common space. Restrictions on Common Space. You can choose to fix the coordinates of row and/or column objects in the common space. Row/Column Restriction Variables. Choose the file containing the restrictions and select the ariables that define the restrictions on the common space. The first ariable selected contains the coordinates of 50 IBM SPSS Categories 24

55 the objects on the first dimension, the second ariable corresponds to coordinates on the second dimension, and so on. A missing alue indicates that a coordinate on a dimension is free. The number of ariables selected must equal the maximum number of dimensions requested. The number of cases for each ariable must equal the number of objects. Multidimensional Unfolding Options The Options dialog box allows you to select the initial configuration style, specify iteration and conergence criteria, and set the penalty term for stress. Initial Configuration. Choose one of the following alternaties: Classical. The rectangular proximity matrix is used to supplement the intra-blocks (alues between rows and between columns) of the complete symmetrical MDS matrix. Once the complete matrix is formed, a classical scaling solution is used as the initial configuration. The intra-blocks can be filled ia imputation using the triangle inequality or Spearman distances. Ross-Cliff. The Ross-Cliff start uses the results of a singular alue decomposition on the double centered and squared proximity matrix as the initial alues for the row and column objects. Correspondence. The correspondence start uses the results of a correspondence analysis on the reersed data (similarities instead of dissimilarities), with symmetric normalization of row and column scores. Centroids. The procedure starts by positioning the row objects in the configuration using an eigenalue decomposition. Then the column objects are positioned at the centroid of the specified choices. For the number of choices, specify a positie integer between 1 and the number of proximities ariables. Multiple random starts. Solutions are computed for seeral initial configurations chosen at random, and the one with the lowest penalized stress is shown as the best solution. Custom. You can select ariables that contain the coordinates of your own initial configuration. The number of ariables selected should equal the maximum number of dimensions specified, with the first ariable corresponding to coordinates on dimension 1, the second ariable corresponding to coordinates on dimension 2, and so on. The number of cases in each ariable should equal the combined number of row and column objects. The row and column coordinates should be stacked, with the column coordinates following the row coordinates. Iteration Criteria. Specify the iteration criteria alues. Stress conergence. The algorithm will stop iterating when the relatie difference in consecutie penalized stress alues is less than the number specified here, which must be non-negatie. Minimum stress. The algorithm will stop when the penalized stress falls below the number specified here, which must be non-negatie. Maximum iterations. The algorithm will perform the number of iterations specified here unless one of the aboe criteria is satisfied first. Penalty Term. The algorithm attempts to minimize penalized stress, a goodness-of-fit measure equal to the product of Kruskal's Stress-I and a penalty term based on the coefficient of ariation of the transformed proximities. These controls allow you to set the strength and range of the penalty term. Strength. The smaller the alue of the strength parameter, the stronger the penalty. Specify a alue between 0.0 and 1.0. Range. This parameter sets the moment at which the penalty becomes actie. If set to 0.0, the penalty is inactie. Increasing the alue causes the algorithm to search for a solution with greater ariation among the transformed proximities. Specify a non-negatie alue. Chapter 8. Multidimensional Unfolding (PREFSCAL) 51

56 Multidimensional Unfolding Plots The Plots dialog box allows you to specify which plots will be produced. Plots. The following plots are aailable: Multiple starts. Displays a stacked histogram of penalized stress displaying both stress and penalty. Initial common space. Displays a scatterplot matrix of the coordinates of the initial common space. Stress per dimension. Produces a lineplot of penalized stress ersus dimensionality. This plot is produced only if the maximum number of dimensions is larger than the minimum number of dimensions. Final common space. A scatterplot matrix of coordinates of the common space is displayed. Space weights. A scatterplot is produced of the indiidual space weights. This is possible only if one of the indiidual differences models is specified in the Model dialog box. For the weighted Euclidean model, the weights for all sources are displayed in a plot, with one dimension on each axis. For the generalized Euclidean model, one plot is produced per dimension, indicating both rotation and weighting of that dimension for each source. Indiidual spaces. A scatterplot matrix of coordinates of the indiidual space of each source is displayed. This is possible only if one of the indiidual differences models is specified in the Model dialog box. Transformation plots. A scatterplot is produced of the original proximities ersus the transformed proximities. Depending on how transformations are applied, a separate color is assigned to each row or source. An unconditional transformation produces a single color. Shepard plots. The original proximities ersus both transformed proximities and distances. The distances are indicated by points, and the transformed proximities are indicated by a line. Depending on how transformations are applied, a separate line is produced for each row or source. An unconditional transformation produces one line. Scatterplot of fit. A scatterplot of the transformed proximities ersus the distances is displayed. A separate color is assigned to each source if multiple sources are specified. Residuals plots. A scatterplot of the transformed proximities ersus the residuals (transformed proximities minus distances) is displayed. A separate color is assigned to each source if multiple sources are specified. Row Object Styles. These gie you further control of the display of row objects in plots. The alues of the optional colors ariable are used to cycle through all colors. The alues of the optional markers ariable are used to cycle through all possible markers. Source Plots. For Indiidual spaces, Scatterplot of fit, and Residuals plots and if transformations are applied by source, for Transformation plots and Shepard plots you can specify the sources for which the plots should be produced. The source numbers entered must be alues of the sources ariable specified in the main dialog box and range from 1 to the number of sources. Row Plots. If transformations are applied by row, for Transformation plots and Shepard plots, you can specify the row for which the plots should be produced. The row numbers entered must range from 1 to the number of rows. Multidimensional Unfolding Output The Output dialog box allows you to control the amount of displayed output and sae some of it to separate files. Display. Select one or more of the following for display: Input data. Includes the original proximities and, if present, the data weights, the initial configuration, and the fixed coordinates. 52 IBM SPSS Categories 24

57 Multiple starts. Displays the random number seed and penalized stress alue of each random start. Initial data. Displays the coordinates of the initial common space. Iteration history. Displays the history of iterations of the main algorithm. Fit measures. Displays different measures. The table contains seeral goodness-of-fit, badness-of-fit, correlation, ariation, and nondegeneracy measures. Stress decomposition. Displays an objects, rows, and sources decomposition of penalized stress, including row, column, and source means and standard deiations. Transformed proximities. Displays the transformed proximities. Final common space. Displays the coordinates of the common space. Space weights. Displays the indiidual space weights. This option is aailable only if one of the indiidual differences models is specified. Depending on the model, the space weights are decomposed in rotation weights and dimension weights, which are also displayed. Indiidual spaces. The coordinates of the indiidual spaces are displayed. This option is aailable only if one of the indiidual differences models is specified. Fitted distances. Displays the distances between the objects in the configuration. Sae to New File. You can sae the common space coordinates, indiidual space weights, distances, and transformed proximities to separate IBM SPSS Statistics data files. PREFSCAL Command Additional Features You can customize your Multidimensional Unfolding of proximities analysis if you paste your selections into a syntax window and edit the resulting PREFSCAL command syntax. The command syntax language also allows you to: Specify multiple source lists for Indiidual spaces, Scatterplots of fit, and Residuals plots and in the case of matrix conditional transformations, for Transformation plots and Shepard plots when multiple sources are aailable (with the PLOT subcommand). Specify multiple row lists for Transformation plots and Shepard plots in the case of row conditional transformations (with the PLOT subcommand). Specify a number of rows instead of a row ID ariable (with the INPUT subcommand). Specify a number of sources instead of a source ID ariable (with the INPUT subcommand). See the Command Syntax Reference for complete syntax information. Chapter 8. Multidimensional Unfolding (PREFSCAL) 53

59 Notices This information was deeloped for products and serices offered in the US. This material might be aailable from IBM in other languages. Howeer, you may be required to own a copy of the product or product ersion in that language in order to access it. IBM may not offer the products, serices, or features discussed in this document in other countries. Consult your local IBM representatie for information on the products and serices currently aailable in your area. Any reference to an IBM product, program, or serice is not intended to state or imply that only that IBM product, program, or serice may be used. Any functionally equialent product, program, or serice that does not infringe any IBM intellectual property right may be used instead. Howeer, it is the user's responsibility to ealuate and erify the operation of any non-ibm product, program, or serice. IBM may hae patents or pending patent applications coering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drie, MD-NC119 Armonk, NY US For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd , Nihonbashi-Hakozakicho, Chuo-ku Tokyo , Japan INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm websites are proided for conenience only and do not in any manner sere as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you proide in any way it beliees appropriate without incurring any obligation to you. 55

60 Licensees of this program who wish to hae information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Director of Licensing IBM Corporation North Castle Drie, MD-NC119 Armonk, NY US Such information may be aailable, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material aailable for it are proided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equialent agreement between us. The performance data and client examples cited are presented for illustratie purposes only. Actual performance results may ary depending on specific configurations and operating conditions. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly aailable sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibmproducts. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objecties only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of indiiduals, companies, brands, and products. All of these names are fictitious and any similarity to actual people or business enterprises is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on arious operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of deeloping, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples hae not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, sericeability, or function of these programs. The sample programs are proided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any deriatie work, must include a copyright notice as follows: your company name) (year). Portions of this code are deried from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights resered. 56 IBM SPSS Categories 24

61 Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and serice names might be trademarks of IBM or other companies. A current list of IBM trademarks is aailable on the web at "Copyright and trademark information" at Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Toralds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Jaa and all Jaa-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Notices 57

63 Index A ANOVA in Categorical Regression 14 B biplots in Categorical Principal Components Analysis 22 in Correspondence Analysis 32 in Multiple Correspondence Analysis 38 bootstrapping Categorical Principal Components Analysis 23 C Categorical Principal Components Analysis 17, 19 bootstrapping 23 command additional features 23 optimal scaling leel 18 sae ariables 21 Categorical Regression 11 command additional features 15 optimal scaling leel 12 plots 11 regularization 14 sae 15 statistics 11 category plots in Categorical Principal Components Analysis 22 in Multiple Correspondence Analysis 39 category quantifications in Categorical Principal Components Analysis 21 in Categorical Regression 14 in Multiple Correspondence Analysis 37 in Nonlinear Canonical Correlation Analysis 27 centroids in Nonlinear Canonical Correlation Analysis 27 common space coordinates in Multidimensional Scaling 46 in Multidimensional Unfolding 52 common space plots in Multidimensional Scaling 45 in Multidimensional Unfolding 52 component loadings in Categorical Principal Components Analysis 21 in Nonlinear Canonical Correlation Analysis 27 component loadings plots in Categorical Principal Components Analysis 22 confidence statistics in Correspondence Analysis 31 correlation matrix in Categorical Principal Components Analysis 21 in Multiple Correspondence Analysis 37 correlations in Multidimensional Scaling 46 correlations plots in Multidimensional Scaling 45 Correspondence Analysis 29, 30, 31, 32 command additional features 33 plots 29 statistics 29 D descriptie statistics in Categorical Regression 14 dimensions in Correspondence Analysis 30 discretization in Categorical Principal Components Analysis 18 in Categorical Regression 12 in Multiple Correspondence Analysis 36 discrimination measures in Multiple Correspondence Analysis 37 discrimination measures plots in Multiple Correspondence Analysis 39 distance measures in Correspondence Analysis 30 distances in Multidimensional Scaling 46 in Multidimensional Unfolding 52 E elastic net in Categorical Regression 14 F final common space plots in Multidimensional Unfolding 52 fit in Nonlinear Canonical Correlation Analysis 27 G generalized Euclidean model in Multidimensional Unfolding 50 I identity model in Multidimensional Unfolding 50 indiidual space coordinates in Multidimensional Unfolding 52 indiidual space weights in Multidimensional Scaling 46 in Multidimensional Unfolding 52 indiidual space weights plots in Multidimensional Scaling 45 in Multidimensional Unfolding 52 indiidual spaces plots in Multidimensional Scaling 45 in Multidimensional Unfolding 52 inertia in Correspondence Analysis 31 initial common space plots in Multidimensional Unfolding 52 initial configuration in Categorical Regression 13 in Multidimensional Scaling 45 in Multidimensional Unfolding 51 in Nonlinear Canonical Correlation Analysis 27 iteration criteria in Multidimensional Scaling 45 in Multidimensional Unfolding 51 iteration history in Categorical Principal Components Analysis 21 in Multidimensional Scaling 46 in Multidimensional Unfolding 52 in Multiple Correspondence Analysis 37 J joint category plots in Categorical Principal Components Analysis 22 in Multiple Correspondence Analysis 39 L lasso in Categorical Regression 14 M missing alues in Categorical Principal Components Analysis 19 in Categorical Regression 13 59

64 missing alues (continued) in Multiple Correspondence Analysis 36 Multidimensional Scaling 41, 42, 43 command additional features 47 model 44 options 45 output 46 plots 41, 45, 46 restrictions 44 statistics 41 Multidimensional Unfolding 49 command additional features 53 model 50 options 51 output 52 plots 49, 52 restrictions on common space 50 statistics 49 Multiple Correspondence Analysis 35, 36 command additional features 39 optimal scaling leel 35 sae ariables 38 multiple R in Categorical Regression 14 multiple starts plots in Multidimensional Unfolding 52 N Nonlinear Canonical Correlation Analysis 25, 26, 27 command additional features 27 plots 25 statistics 25 normalization in Correspondence Analysis 30 O object points plots in Categorical Principal Components Analysis 22 in Multiple Correspondence Analysis 38 object scores in Categorical Principal Components Analysis 21 in Multiple Correspondence Analysis 37 in Nonlinear Canonical Correlation Analysis 27 optimal scaling leel in Categorical Principal Components Analysis 18 in Multiple Correspondence Analysis 35 P penalty term in Multidimensional Unfolding 51 plots in Categorical Regression 15 in Correspondence Analysis IBM SPSS Categories 24 plots (continued) in Multidimensional Scaling 45, 46 in Nonlinear Canonical Correlation Analysis 27 PREFSCAL 49 projected centroids plots in Categorical Principal Components Analysis 22 proximity transformations in Multidimensional Unfolding 50 R regression coefficients in Categorical Regression 14 relaxed updates in Multidimensional Scaling 45 residuals plots in Multidimensional Unfolding 52 restrictions in Multidimensional Scaling 44 restrictions on common space in Multidimensional Unfolding 50 ridge regression in Categorical Regression 14 S scaling model in Multidimensional Unfolding 50 scatterplot of fit in Multidimensional Unfolding 52 Shepard plots in Multidimensional Unfolding 52 space weights plots in Multidimensional Unfolding 52 standardization in Correspondence Analysis 30 stress measures in Multidimensional Scaling 46 in Multidimensional Unfolding 52 stress plots in Multidimensional Scaling 45 in Multidimensional Unfolding 52 supplementary objects in Categorical Regression 13 T transformation plots in Categorical Principal Components Analysis 22 in Multidimensional Scaling 45 in Multidimensional Unfolding 52 in Multiple Correspondence Analysis 39 transformed independent ariables in Multidimensional Scaling 46 transformed proximities in Multidimensional Scaling 46 in Multidimensional Unfolding 52 triplots in Categorical Principal Components Analysis 22 V ariable weight in Categorical Principal Components Analysis 18 in Multiple Correspondence Analysis 35 ariance accounted for in Categorical Principal Components Analysis 21 W weighted Euclidean model in Multidimensional Unfolding 50 weights in Nonlinear Canonical Correlation Analysis 27

65

66 IBM Printed in USA