IBM SPSS Categories 24 IBM

Size: px
Start display at page:

Download "IBM SPSS Categories 24 IBM"

Transcription

1 IBM SPSS Categories 24 IBM

2 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to ersion 24, release 0, modification 0 of IBM SPSS Statistics and to all subsequent releases and modifications until otherwise indicated in new editions.

3 Contents Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data. 1 What Is Optimal Scaling? Why Use Optimal Scaling? Optimal Scaling Leel and Measurement Leel... 2 Selecting the Optimal Scaling Leel Transformation Plots Category Codes Which Procedure Is Best for Your Application?... 5 Categorical Regression Categorical Principal Components Analysis... 6 Nonlinear Canonical Correlation Analysis... 6 Correspondence Analysis Multiple Correspondence Analysis Multidimensional Scaling Multidimensional Unfolding Aspect Ratio in Optimal Scaling Charts Chapter 2. Categorical Regression (CATREG) Define Scale in Categorical Regression Categorical Regression Discretization Categorical Regression Missing Values Categorical Regression Options Categorical Regression Regularization Categorical Regression Output Categorical Regression Sae Categorical Regression Transformation Plots CATREG Command Additional Features Chapter 3. Categorical Principal Components Analysis (CATPCA) Define Scale and Weight in CATPCA Categorical Principal Components Analysis Discretization Categorical Principal Components Analysis Missing Values Categorical Principal Components Analysis Options 19 Categorical Principal Components Analysis Output 21 Categorical Principal Components Analysis Sae.. 21 Categorical Principal Components Analysis Object Plots Categorical Principal Components Analysis Category Plots Categorical Principal Components Analysis Loading Plots Categorical Principal Components Analysis Bootstrap CATPCA Command Additional Features Chapter 4. Nonlinear Canonical Correlation Analysis (OVERALS) Define Range and Scale Define Range Nonlinear Canonical Correlation Analysis Options 27 OVERALS Command Additional Features Chapter 5. Correspondence Analysis 29 Define Row Range in Correspondence Analysis.. 30 Define Column Range in Correspondence Analysis 30 Correspondence Analysis Model Correspondence Analysis Statistics Correspondence Analysis Plots CORRESPONDENCE Command Additional Features Chapter 6. Multiple Correspondence Analysis Define Variable Weight in Multiple Correspondence Analysis Multiple Correspondence Analysis Discretization.. 36 Multiple Correspondence Analysis Missing Values 36 Multiple Correspondence Analysis Options Multiple Correspondence Analysis Output Multiple Correspondence Analysis Sae Multiple Correspondence Analysis Object Plots.. 38 Multiple Correspondence Analysis Variable Plots.. 39 MULTIPLE CORRESPONDENCE Command Additional Features Chapter 7. Multidimensional Scaling (PROXSCAL) Proximities in Matrices across Columns Proximities in Columns Proximities in One Column Create Proximities from Data Create Measure from Data Define a Multidimensional Scaling Model Multidimensional Scaling Restrictions Multidimensional Scaling Options Multidimensional Scaling Plots, Version Multidimensional Scaling Plots, Version Multidimensional Scaling Output PROXSCAL Command Additional Features Chapter 8. Multidimensional Unfolding (PREFSCAL) Define a Multidimensional Unfolding Model Multidimensional Unfolding Restrictions Multidimensional Unfolding Options Multidimensional Unfolding Plots Multidimensional Unfolding Output PREFSCAL Command Additional Features Notices Trademarks Index iii

4 i IBM SPSS Categories 24

5 Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data Categories procedures use optimal scaling to analyze data that are difficult or impossible for standard statistical procedures to analyze. This chapter describes what each procedure does, the situations in which each procedure is most appropriate, the relationships between the procedures, and the relationships of these procedures to their standard statistical counterparts. Note: These procedures and their implementation in IBM SPSS Statistics were deeloped by the Data Theory Scaling System Group (DTSS), consisting of members of the departments of Education and Psychology, Faculty of Social and Behaioral Sciences, Leiden Uniersity. What Is Optimal Scaling? The idea behind optimal scaling is to assign numerical quantifications to the categories of each ariable, thus allowing standard procedures to be used to obtain a solution on the quantified ariables. The optimal scale alues are assigned to categories of each ariable based on the optimizing criterion of the procedure in use. Unlike the original labels of the nominal or ordinal ariables in the analysis, these scale alues hae metric properties. In most Categories procedures, the optimal quantification for each scaled ariable is obtained through an iteratie method called alternating least squares in which, after the current quantifications are used to find a solution, the quantifications are updated using that solution. The updated quantifications are then used to find a new solution, which is used to update the quantifications, and so on, until some criterion is reached that signals the process to stop. Why Use Optimal Scaling? Categorical data are often found in marketing research, surey research, and research in the social and behaioral sciences. In fact, many researchers deal almost exclusiely with categorical data. While adaptations of most standard models exist specifically to analyze categorical data, they often do not perform well for datasets that feature: Too few obserations Too many ariables Too many alues per ariable By quantifying categories, optimal scaling techniques aoid problems in these situations. Moreoer, they are useful een when specialized techniques are appropriate. Rather than interpreting parameter estimates, the interpretation of optimal scaling output is often based on graphical displays. Optimal scaling techniques offer excellent exploratory analyses, which complement other IBM SPSS Statistics models well. By narrowing the focus of your inestigation, isualizing your data through optimal scaling can form the basis of an analysis that centers on interpretation of model parameters. Copyright IBM Corporation 1989,

6 Optimal Scaling Leel and Measurement Leel This can be a ery confusing concept when you first use Categories procedures. When specifying the leel, you specify not the leel at which ariables are measured but the leel at which they are scaled. The idea is that the ariables to be quantified may hae nonlinear relations regardless of how they are measured. For Categories purposes, there are three basic leels of measurement: The nominal leel implies that a ariable's alues represent unordered categories. Examples of ariables that might be nominal are region, zip code area, religious affiliation, and multiple choice categories. The ordinal leel implies that a ariable's alues represent ordered categories. Examples include attitude scales representing degree of satisfaction or confidence and preference rating scores. The numerical leel implies that a ariable's alues represent ordered categories with a meaningful metric so that distance comparisons between categories are appropriate. Examples include age in years and income in thousands of dollars. For example, suppose that the ariables region, job, and age are coded as shown in the following table. Table 1. Coding scheme for region, job, and age Region code Region alue Job code Job alue Age 1 North 1 intern 20 2 South 2 sales rep 22 3 East 3 manager 25 4 West 27 The alues shown represent the categories of each ariable. Region would be a nominal ariable. There are four categories of region, with no intrinsic ordering. Values 1 through 4 simply represent the four categories; the coding scheme is completely arbitrary. Job, on the other hand, could be assumed to be an ordinal ariable. The original categories form a progression from intern to manager. Larger codes represent a job higher on the corporate ladder. Howeer, only the order information is known--nothing can be said about the distance between adjacent categories. In contrast, age could be assumed to be a numerical ariable. In the case of age, the distances between the alues are intrinsically meaningful. The distance between 20 and 22 is the same as the distance between 25 and 27, while the distance between 22 and 25 is greater than either of these. Selecting the Optimal Scaling Leel It is important to understand that there are no intrinsic properties of a ariable that automatically predefine what optimal scaling leel you should specify for it. You can explore your data in any way that makes sense and makes interpretation easier. By analyzing a numerical-leel ariable at the ordinal leel, for example, the use of a nonlinear transformation may allow a solution in fewer dimensions. The following two examples illustrate how the "obious" leel of measurement might not be the best optimal scaling leel. Suppose that a ariable sorts objects into age groups. Although age can be scaled as a numerical ariable, it may be true that for people younger than 25 safety has a positie relation with age, whereas for people older than 60 safety has a negatie relation with age. In this case, it might be better to treat age as a nominal ariable. As another example, a ariable that sorts persons by political preference appears to be essentially nominal. Howeer, if you order the parties from political left to political right, you might want the quantification of parties to respect this order by using an ordinal leel of analysis. 2 IBM SPSS Categories 24

7 Een though there are no predefined properties of a ariable that make it exclusiely one leel or another, there are some general guidelines to help the noice user. With single-nominal quantification, you don't usually know the order of the categories but you want the analysis to impose one. If the order of the categories is known, you should try ordinal quantification. If the categories are unorderable, you might try multiple-nominal quantification. Transformation Plots The different leels at which each ariable can be scaled impose different restrictions on the quantifications. Transformation plots illustrate the relationship between the quantifications and the original categories resulting from the selected optimal scaling leel. For example, a linear transformation plot results when a ariable is treated as numerical. Variables treated as ordinal result in a nondecreasing transformation plot. Transformation plots for ariables treated nominally that are U-shaped (or the inerse) display a quadratic relationship. Nominal ariables could also yield transformation plots without apparent trends by changing the order of the categories completely. The following figure displays a sample transformation plot. Transformation plots are particularly suited to determining how well the selected optimal scaling leel performs. If seeral categories receie similar quantifications, collapsing these categories into one category may be warranted. Alternatiely, if a ariable treated as nominal receies quantifications that display an increasing trend, an ordinal transformation may result in a similar fit. If that trend is linear, numerical treatment may be appropriate. Howeer, if collapsing categories or changing scaling leels is warranted, the analysis will not change significantly. Category Codes Some care should be taken when coding categorical ariables because some coding schemes may yield unwanted output or incomplete analyses. Possible coding schemes for job are displayed in the following table. Table 2. Alternatie coding schemes for job Category A B C D intern sales rep manager Some Categories procedures require that the range of eery ariable used be defined. Any alue outside this range is treated as a missing alue. The minimum category alue is always 1. The maximum category alue is supplied by the user. This alue is not the number of categories for a ariable it is the largest category alue. For example, in the table, scheme A has a maximum category alue of 3 and scheme B has a maximum category alue of 7, yet both schemes code the same three categories. The ariable range determines which categories will be omitted from the analysis. Any categories with codes outside the defined range are omitted from the analysis. This is a simple method for omitting categories but can result in unwanted analyses. An incorrectly defined maximum category can omit alid categories from the analysis. For example, for scheme B, defining the maximum category alue to be 3 indicates that job has categories coded from 1 to 3; the manager category is treated as missing. Because no category has actually been coded 3, the third category in the analysis contains no cases. If you wanted to omit all manager categories, this analysis would be appropriate. Howeer, if managers are to be included, the maximum category must be defined as 7, and missing alues must be coded with alues aboe 7 or below 1. For ariables treated as nominal or ordinal, the range of the categories does not affect the results. For nominal ariables, only the label and not the alue associated with that label is important. For ordinal ariables, the order of the categories is presered in the quantifications; the category alues themseles Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 3

8 are not important. All coding schemes resulting in the same category ordering will hae identical results. For example, the first three schemes in the table are functionally equialent if job is analyzed at an ordinal leel. The order of the categories is identical in these schemes. Scheme D, on the other hand, inerts the second and third categories and will yield different results than the other schemes. Although many coding schemes for a ariable are functionally equialent, schemes with small differences between codes are preferred because the codes hae an impact on the amount of output produced by a procedure. All categories coded with alues between 1 and the user-defined maximum are alid. If any of these categories are empty, the corresponding quantifications will be either system-missing or 0, depending on the procedure. Although neither of these assignments affect the analyses, output is produced for these categories. Thus, for scheme B, job has four categories that receie system-missing alues. For scheme C, there are also four categories receiing system-missing indicators. In contrast, for scheme A there are no system-missing quantifications. Using consecutie integers as codes for ariables treated as nominal or ordinal results in much less output without affecting the results. Coding schemes for ariables treated as numerical are more restricted than the ordinal case. For these ariables, the differences between consecutie categories are important. The following table displays three coding schemes for age. Table 3. Alternatie coding schemes for age Category A B C Any recoding of numerical ariables must presere the differences between the categories. Using the original alues is one method for ensuring preseration of differences. Howeer, this can result in many categories haing system-missing indicators. For example, scheme A employs the original obsered alues. For all Categories procedures except for Correspondence Analysis, the maximum category alue is 27 and the minimum category alue is set to 1. The first 19 categories are empty and receie system-missing indicators. The output can quickly become rather cumbersome if the maximum category is much greater than 1 and there are many empty categories between 1 and the maximum. To reduce the amount of output, recoding can be done. Howeer, in the numerical case, the Automatic Recode facility should not be used. Coding to consecutie integers results in differences of 1 between all consecutie categories, and, as a result, all quantifications will be equally spaced. The metric characteristics deemed important when treating a ariable as numerical are destroyed by recoding to consecutie integers. For example, scheme C in the table corresponds to automatically recoding age. The difference between categories 22 and 25 has changed from three to one, and the quantifications will reflect the latter difference. An alternatie recoding scheme that preseres the differences between categories is to subtract the smallest category alue from eery category and add 1 to each difference. Scheme B results from this transformation. The smallest category alue, 20, has been subtracted from each category, and 1 was added to each result. The transformed codes hae a minimum of 1, and all differences are identical to the original data. The maximum category alue is now 8, and the zero quantifications before the first nonzero quantification are all eliminated. Yet, the nonzero quantifications corresponding to each category resulting from scheme B are identical to the quantifications from scheme A. 4 IBM SPSS Categories 24

9 Which Procedure Is Best for Your Application? The techniques embodied in four of these procedures (Correspondence Analysis, Multiple Correspondence Analysis, Categorical Principal Components Analysis, and Nonlinear Canonical Correlation Analysis) fall into the general area of multiariate data analysis known as dimension reduction. That is, relationships between ariables are represented in a few dimensions say two or three as often as possible. This enables you to describe structures or patterns in the relationships that would be too difficult to fathom in their original richness and complexity. In market research applications, these techniques can be a form of perceptual mapping. A major adantage of these procedures is that they accommodate data with different leels of optimal scaling. Categorical Regression describes the relationship between a categorical response ariable and a combination of categorical predictor ariables. The influence of each predictor ariable on the response ariable is described by the corresponding regression weight. As in the other procedures, data can be analyzed with different leels of optimal scaling. Multidimensional Scaling and Multidimensional Unfolding describe relationships between objects in a low-dimensional space, using the proximities between the objects. Following are brief guidelines for each of the procedures: Use Categorical Regression to predict the alues of a categorical dependent ariable from a combination of categorical independent ariables. Use Categorical Principal Components Analysis to account for patterns of ariation in a single set of ariables of mixed optimal scaling leels. Use Nonlinear Canonical Correlation Analysis to assess the extent to which two or more sets of ariables of mixed optimal scaling leels are correlated. Use Correspondence Analysis to analyze two-way contingency tables or data that can be expressed as a two-way table, such as brand preference or sociometric choice data. Use Multiple Correspondence Analysis to analyze a categorical multiariate data matrix when you are willing to make no stronger assumption that all ariables are analyzed at the nominal leel. Use Multidimensional Scaling to analyze proximity data to find a least-squares representation of a single set of objects in a low-dimensional space. Use Multidimensional Unfolding to analyze proximity data to find a least-squares representation of two sets of objects in a low-dimensional space. Categorical Regression The use of Categorical Regression is most appropriate when the goal of your analysis is to predict a dependent (response) ariable from a set of independent (predictor) ariables. As with all optimal scaling procedures, scale alues are assigned to each category of eery ariable such that these alues are optimal with respect to the regression. The solution of a categorical regression maximizes the squared correlation between the transformed response and the weighted combination of transformed predictors. Relation to other Categories procedures. Categorical regression with optimal scaling is comparable to optimal scaling canonical correlation analysis with two sets, one of which contains only the dependent ariable. In the latter technique, similarity of sets is deried by comparing each set to an unknown ariable that lies somewhere between all of the sets. In categorical regression, similarity of the transformed response and the linear combination of transformed predictors is assessed directly. Relation to standard techniques. In standard linear regression, categorical ariables can either be recoded as indicator ariables or be treated in the same fashion as interal leel ariables. In the first approach, the model contains a separate intercept and slope for each combination of the leels of the categorical ariables. This results in a large number of parameters to interpret. In the second approach, only one parameter is estimated for each ariable. Howeer, the arbitrary nature of the category codings makes generalizations impossible. Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 5

10 If some of the ariables are not continuous, alternatie analyses are aailable. If the response is continuous and the predictors are categorical, analysis of ariance is often employed. If the response is categorical and the predictors are continuous, logistic regression or discriminant analysis may be appropriate. If the response and the predictors are both categorical, loglinear models are often used. Regression with optimal scaling offers three scaling leels for each ariable. Combinations of these leels can account for a wide range of nonlinear relationships for which any single "standard" method is ill-suited. Consequently, optimal scaling offers greater flexibility than the standard approaches with minimal added complexity. In addition, nonlinear transformations of the predictors usually reduce the dependencies among the predictors. If you compare the eigenalues of the correlation matrix for the predictors with the eigenalues of the correlation matrix for the optimally scaled predictors, the latter set will usually be less ariable than the former. In other words, in categorical regression, optimal scaling makes the larger eigenalues of the predictor correlation matrix smaller and the smaller eigenalues larger. Categorical Principal Components Analysis The use of Categorical Principal Components Analysis is most appropriate when you want to account for patterns of ariation in a single set of ariables of mixed optimal scaling leels. This technique attempts to reduce the dimensionality of a set of ariables while accounting for as much of the ariation as possible. Scale alues are assigned to each category of eery ariable so that these alues are optimal with respect to the principal components solution. Objects in the analysis receie component scores based on the quantified data. Plots of the component scores reeal patterns among the objects in the analysis and can reeal unusual objects in the data. The solution of a categorical principal components analysis maximizes the correlations of the object scores with each of the quantified ariables for the number of components (dimensions) specified. An important application of categorical principal components is to examine preference data, in which respondents rank or rate a number of items with respect to preference. In the usual IBM SPSS Statistics data configuration, rows are indiiduals, columns are measurements for the items, and the scores across rows are preference scores (on a 0 to 10 scale, for example), making the data row-conditional. For preference data, you may want to treat the indiiduals as ariables. Using the Transpose procedure, you can transpose the data. The raters become the ariables, and all ariables are declared ordinal. There is no objection to using more ariables than objects in CATPCA. Relation to other Categories procedures. If all ariables are declared multiple nominal, categorical principal components analysis produces an analysis equialent to a multiple correspondence analysis run on the same ariables. Thus, categorical principal components analysis can be seen as a type of multiple correspondence analysis in which some of the ariables are declared ordinal or numerical. Relation to standard techniques. If all ariables are scaled on the numerical leel, categorical principal components analysis is equialent to standard principal components analysis. More generally, categorical principal components analysis is an alternatie to computing the correlations between non-numerical scales and analyzing them using a standard principal components or factor-analysis approach. Naie use of the usual Pearson correlation coefficient as a measure of association for ordinal data can lead to nontriial bias in estimation of the correlations. Nonlinear Canonical Correlation Analysis Nonlinear Canonical Correlation Analysis is a ery general procedure with many different applications. The goal of nonlinear canonical correlation analysis is to analyze the relationships between two or more sets of ariables instead of between the ariables themseles, as in principal components analysis. For example, you may hae two sets of ariables, where one set of ariables might be demographic background items on a set of respondents and a second set might be responses to a set of attitude items. The scaling leels in the analysis can be any mix of nominal, ordinal, and numerical. Optimal scaling 6 IBM SPSS Categories 24

11 canonical correlation analysis determines the similarity among the sets by simultaneously comparing the canonical ariables from each set to a compromise set of scores assigned to the objects. Relation to other Categories procedures. If there are two or more sets of ariables with only one ariable per set, optimal scaling canonical correlation analysis is equialent to optimal scaling principal components analysis. If all ariables in a one-ariable-per-set analysis are multiple nominal, optimal scaling canonical correlation analysis is equialent to multiple correspondence analysis. If there are two sets of ariables, one of which contains only one ariable, optimal scaling canonical correlation analysis is equialent to categorical regression with optimal scaling. Relation to standard techniques. Standard canonical correlation analysis is a statistical technique that finds a linear combination of one set of ariables and a linear combination of a second set of ariables that are maximally correlated. Gien this set of linear combinations, canonical correlation analysis can find subsequent independent sets of linear combinations, referred to as canonical ariables, up to a maximum number equal to the number of ariables in the smaller set. If there are two sets of ariables in the analysis and all ariables are defined to be numerical, optimal scaling canonical correlation analysis is equialent to a standard canonical correlation analysis. Although IBM SPSS Statistics does not hae a canonical correlation analysis procedure, many of the releant statistics can be obtained from multiariate analysis of ariance. Optimal scaling canonical correlation analysis has arious other applications. If you hae two sets of ariables and one of the sets contains a nominal ariable declared as single nominal, optimal scaling canonical correlation analysis results can be interpreted in a similar fashion to regression analysis. If you consider the ariable to be multiple nominal, the optimal scaling analysis is an alternatie to discriminant analysis. Grouping the ariables in more than two sets proides a ariety of ways to analyze your data. Correspondence Analysis The goal of correspondence analysis is to make biplots for correspondence tables. In a correspondence table, the row and column ariables are assumed to represent unordered categories; therefore, the nominal optimal scaling leel is always used. Both ariables are inspected for their nominal information only. That is, the only consideration is the fact that some objects are in the same category while others are not. Nothing is assumed about the distance or order between categories of the same ariable. One specific use of correspondence analysis is the analysis of two-way contingency tables. If a table has r actie rows and c actie columns, the number of dimensions in the correspondence analysis solution is the minimum of r minus 1 or c minus 1, whicheer is less. In other words, you could perfectly represent the row categories or the column categories of a contingency table in a space of dimensions. Practically speaking, howeer, you would like to represent the row and column categories of a two-way table in a low-dimensional space, say two dimensions, for the reason that two-dimensional plots are more easily comprehensible than multidimensional spatial representations. When fewer than the maximum number of possible dimensions is used, the statistics produced in the analysis describe how well the row and column categories are represented in the low-dimensional representation. Proided that the quality of representation of the two-dimensional solution is good, you can examine plots of the row points and the column points to learn which categories of the row ariable are similar, which categories of the column ariable are similar, and which row and column categories are similar to each other. Relation to other Categories procedures. Simple correspondence analysis is limited to two-way tables. If there are more than two ariables of interest, you can combine ariables to create interaction ariables. For example, for the ariables region, job, and age, you can combine region and job to create a new ariable rejob with the 12 categories shown in the following table. This new ariable forms a two-way table with age (12 rows, 4 columns), which can be analyzed in correspondence analysis. Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 7

12 Table 4. Combinations of region and job Category code Category definition Category code Category definition 1 North, intern 7 East, intern 2 North, sales rep 8 East, sales rep 3 North, manager 9 East, manager 4 South, intern 10 West, intern 5 South, sales rep 11 West, sales rep 6 South, manager 12 West, manager One shortcoming of this approach is that any pair of ariables can be combined. We can combine job and age, yielding another 12-category ariable. Or we can combine region and age, which results in a new 16-category ariable. Each of these interaction ariables forms a two-way table with the remaining ariable. Correspondence analyses of these three tables will not yield identical results, yet each is a alid approach. Furthermore, if there are four or more ariables, two-way tables comparing an interaction ariable with another interaction ariable can be constructed. The number of possible tables to analyze can get quite large, een for a few ariables. You can select one of these tables to analyze, or you can analyze all of them. Alternatiely, the Multiple Correspondence Analysis procedure can be used to examine all of the ariables simultaneously without the need to construct interaction ariables. Relation to standard techniques. The Crosstabs procedure can also be used to analyze contingency tables, with independence as a common focus in the analyses. Howeer, een in small tables, detecting the cause of departures from independence may be difficult. The utility of correspondence analysis lies in displaying such patterns for two-way tables of any size. If there is an association between the row and column ariables--that is, if the chi-square alue is significant--correspondence analysis may help reeal the nature of the relationship. Multiple Correspondence Analysis Multiple Correspondence Analysis tries to produce a solution in which objects within the same category are plotted close together and objects in different categories are plotted far apart. Each object is as close as possible to the category points of categories that apply to the object. In this way, the categories diide the objects into homogeneous subgroups. Variables are considered homogeneous when they classify objects in the same categories into the same subgroups. For a one-dimensional solution, multiple correspondence analysis assigns optimal scale alues (category quantifications) to each category of each ariable in such a way that oerall, on aerage, the categories hae maximum spread. For a two-dimensional solution, multiple correspondence analysis finds a second set of quantifications of the categories of each ariable unrelated to the first set, attempting again to maximize spread, and so on. Because categories of a ariable receie as many scorings as there are dimensions, the ariables in the analysis are assumed to be multiple nominal in optimal scaling leel. Multiple correspondence analysis also assigns scores to the objects in the analysis in such a way that the category quantifications are the aerages, or centroids, of the object scores of objects in that category. Relation to other Categories procedures. Multiple correspondence analysis is also known as homogeneity analysis or dual scaling. It gies comparable, but not identical, results to correspondence analysis when there are only two ariables. Correspondence analysis produces unique output summarizing the fit and quality of representation of the solution, including stability information. Thus, correspondence analysis is usually preferable to multiple correspondence analysis in the two-ariable case. Another difference between the two procedures is that the input to multiple correspondence analysis is a data matrix, where the rows are objects and the columns are ariables, while the input to correspondence analysis can be the same data matrix, a general proximity matrix, or a joint contingency 8 IBM SPSS Categories 24

13 table, which is an aggregated matrix in which both the rows and columns represent categories of ariables. Multiple correspondence analysis can also be thought of as principal components analysis of data scaled at the multiple nominal leel. Relation to standard techniques. Multiple correspondence analysis can be thought of as the analysis of a multiway contingency table. Multiway contingency tables can also be analyzed with the Crosstabs procedure, but Crosstabs gies separate summary statistics for each category of each control ariable. With multiple correspondence analysis, it is often possible to summarize the relationship between all of the ariables with a single two-dimensional plot. An adanced use of multiple correspondence analysis is to replace the original category alues with the optimal scale alues from the first dimension and perform a secondary multiariate analysis. Since multiple correspondence analysis replaces category labels with numerical scale alues, many different procedures that require numerical data can be applied after the multiple correspondence analysis. For example, the Factor Analysis procedure produces a first principal component that is equialent to the first dimension of multiple correspondence analysis. The component scores in the first dimension are equal to the object scores, and the squared component loadings are equal to the discrimination measures. The second multiple correspondence analysis dimension, howeer, is not equal to the second dimension of factor analysis. Multidimensional Scaling The use of Multidimensional Scaling is most appropriate when the goal of your analysis is to find the structure in a set of distance measures between a single set of objects or cases. This is accomplished by assigning obserations to specific locations in a conceptual low-dimensional space so that the distances between points in the space match the gien (dis)similarities as closely as possible. The result is a least-squares representation of the objects in that low-dimensional space, which, in many cases, will help you further understand your data. Relation to other Categories procedures. When you hae multiariate data from which you create distances and which you then analyze with multidimensional scaling, the results are similar to analyzing the data using categorical principal components analysis with object principal normalization. This kind of PCA is also known as principal coordinates analysis. Relation to standard techniques. The Categories Multidimensional Scaling procedure (PROXSCAL) offers seeral improements upon the scaling procedure aailable in the Statistics Base option (ALSCAL). PROXSCAL offers an accelerated algorithm for certain models and allows you to put restrictions on the common space. Moreoer, PROXSCAL attempts to minimize normalized raw stress rather than S-stress (also referred to as strain). The normalized raw stress is generally preferred because it is a measure based on the distances, while the S-stress is based on the squared distances. Multidimensional Unfolding The use of Multidimensional Unfolding is most appropriate when the goal of your analysis is to find the structure in a set of distance measures between two sets of objects (referred to as the row and column objects). This is accomplished by assigning obserations to specific locations in a conceptual low-dimensional space so that the distances between points in the space match the gien (dis)similarities as closely as possible. The result is a least-squares representation of the row and column objects in that low-dimensional space, which, in many cases, will help you further understand your data. Relation to other Categories procedures. If your data consist of distances between a single set of objects (a square, symmetrical matrix), use Multidimensional Scaling. Relation to standard techniques. The Categories Multidimensional Unfolding procedure (PREFSCAL) offers seeral improements upon the unfolding functionality aailable in the Statistics Base option (through ALSCAL). PREFSCAL allows you to put restrictions on the common space; moreoer, PREFSCAL attempts to minimize a penalized stress measure that helps it to aoid degenerate solutions (to which older algorithms are prone). Chapter 1. Introduction to Optimal Scaling Procedures for Categorical Data 9

14 Aspect Ratio in Optimal Scaling Charts Aspect ratio in optimal scaling plots is isotropic. In a two-dimensional plot, the distance representing one unit in dimension 1 is equal to the distance representing one unit in dimension 2. If you change the range of a dimension in a two-dimensional plot, the system changes the size of the other dimension to keep the physical distances equal. Isotropic aspect ratio cannot be oerridden for the optimal scaling procedures. 10 IBM SPSS Categories 24

15 Chapter 2. Categorical Regression (CATREG) Categorical regression quantifies categorical data by assigning numerical alues to the categories, resulting in an optimal linear regression equation for the transformed ariables. Categorical regression is also known by the acronym CATREG, for categorical regression. Standard linear regression analysis inoles minimizing the sum of squared differences between a response (dependent) ariable and a weighted combination of predictor (independent) ariables. Variables are typically quantitatie, with (nominal) categorical data recoded to binary or contrast ariables. As a result, categorical ariables sere to separate groups of cases, and the technique estimates separate sets of parameters for each group. The estimated coefficients reflect how changes in the predictors affect the response. Prediction of the response is possible for any combination of predictor alues. An alternatie approach inoles regressing the response on the categorical predictor alues themseles. Consequently, one coefficient is estimated for each ariable. Howeer, for categorical ariables, the category alues are arbitrary. Coding the categories in different ways yield different coefficients, making comparisons across analyses of the same ariables difficult. CATREG extends the standard approach by simultaneously scaling nominal, ordinal, and numerical ariables. The procedure quantifies categorical ariables so that the quantifications reflect characteristics of the original categories. The procedure treats quantified categorical ariables in the same way as numerical ariables. Using nonlinear transformations allow ariables to be analyzed at a ariety of leels to find the best-fitting model. Example. Categorical regression could be used to describe how job satisfaction depends on job category, geographic region, and amount of trael. You might find that high leels of satisfaction correspond to managers and low trael. The resulting regression equation could be used to predict job satisfaction for any combination of the three independent ariables. Statistics and plots. Frequencies, regression coefficients, ANOVA table, iteration history, category quantifications, correlations between untransformed predictors, correlations between transformed predictors, residual plots, and transformation plots. Categorical Regression Data Considerations Data. CATREG operates on category indicator ariables. The category indicators should be positie integers. You can use the Discretization dialog box to conert fractional-alue ariables and string ariables into positie integers. Assumptions. Only one response ariable is allowed, but the maximum number of predictor ariables is 200. The data must contain at least three alid cases, and the number of alid cases must exceed the number of predictor ariables plus one. Related procedures. CATREG is equialent to categorical canonical correlation analysis with optimal scaling (OVERALS) with two sets, one of which contains only one ariable. Scaling all ariables at the numerical leel corresponds to standard multiple regression analysis. To Obtain a Categorical Regression 1. From the menus choose: Analyze > Regression > Optimal Scaling (CATREG) Select the dependent ariable and independent ariable(s). Copyright IBM Corporation 1989,

16 3. Click OK. Optionally, change the scaling leel for each ariable. Define Scale in Categorical Regression You can set the optimal scaling leel for the dependent and independent ariables. By default, they are scaled as second-degree monotonic splines (ordinal) with two interior knots. Additionally, you can set the weight for analysis ariables. Optimal Scaling Leel. You can also select the scaling leel for quantifying each ariable. Spline Ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth monotonic piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Spline Nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation is a smooth, possibly nonmonotonic, piecewise polynomial of the chosen degree. The pieces are specified by the user-specified number and procedure-determined placement of the interior knots. Ordinal. The order of the categories of the obsered ariable is presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline ordinal transformation but is less smooth. Nominal. The only information in the obsered ariable that is presered in the optimally scaled ariable is the grouping of objects in categories. The order of the categories of the obsered ariable is not presered. Category points will be on a straight line (ector) through the origin. The resulting transformation fits better than the spline nominal transformation but is less smooth. Numeric. Categories are treated as ordered and equally spaced (interal leel). The order of the categories and the equal distances between category numbers of the obsered ariable are presered in the optimally scaled ariable. Category points will be on a straight line (ector) through the origin. When all ariables are at the numeric leel, the analysis is analogous to standard principal components analysis. Categorical Regression Discretization The Discretization dialog box allows you to select a method of recoding your ariables. Fractional-alue ariables are grouped into seen categories (or into the number of distinct alues of the ariable if this number is less than seen) with an approximately normal distribution unless otherwise specified. String ariables are always conerted into positie integers by assigning category indicators according to ascending alphanumeric order. Discretization for string ariables applies to these integers. Other ariables are left alone by default. The discretized ariables are then used in the analysis. Method. Choose between grouping, ranking, and multiplying. Grouping. Recode into a specified number of categories or recode by interal. Ranking. The ariable is discretized by ranking the cases. Multiplying. The current alues of the ariable are standardized, multiplied by 10, rounded, and hae a constant added so that the lowest discretized alue is 1. Grouping. The following options are aailable when discretizing ariables by grouping: Number of categories. Specify a number of categories and whether the alues of the ariable should follow an approximately normal or uniform distribution across those categories. 12 IBM SPSS Categories 24

17 Equal interals. Variables are recoded into categories defined by these equally sized interals. You must specify the length of the interals. Categorical Regression Missing Values The Missing Values dialog box allows you to choose the strategy for handling missing alues in analysis ariables and supplementary ariables. Strategy. Choose to exclude objects with missing alues (listwise deletion) or impute missing alues (actie treatment). Exclude objects with missing alues on this ariable. Objects with missing alues on the selected ariable are excluded from the analysis. This strategy is not aailable for supplementary ariables. Impute missing alues. Objects with missing alues on the selected ariable hae those alues imputed. You can choose the method of imputation. Select Mode to replace missing alues with the most frequent category. When there are multiple modes, the one with the smallest category indicator is used. Select Extra category to replace missing alues with the same quantification of an extra category. This implies that objects with a missing alue on this ariable are considered to belong to the same (extra) category. Categorical Regression Options The Options dialog box allows you to select the initial configuration style, specify iteration and conergence criteria, select supplementary objects, and set the labeling of plots. Supplementary Objects. This allows you to specify the objects that you want to treat as supplementary. Simply type the number of a supplementary object (or specify a range of cases) and click Add. You cannot weight supplementary objects (specified weights are ignored). Initial Configuration. If no ariables are treated as nominal, select the Numerical configuration. If at least one ariable is treated as nominal, select the Random configuration. Alternatiely, if at least one ariable has an ordinal or spline ordinal scaling leel, the usual model-fitting algorithm can result in a suboptimal solution. Choosing Multiple systematic starts with all possible sign patterns to test will always find the optimal solution, but the necessary processing time rapidly increases as the number of ordinal and spline ordinal ariables in the dataset increase. You can reduce the number of test patterns by specifying a percentage of loss of ariance threshold, where the higher the threshold, the more sign patterns will be excluded. With this option, obtaining the optimal solution is not garantueed, but the chance of obtaining a suboptimal solution is diminished. Also, if the optimal solution is not found, the chance that the suboptimal solution is ery different from the optimal solution is diminished. When multiple systematic starts are requested, the signs of the regression coefficients for each start are written to an external IBM SPSS Statistics data file or dataset in the current session. See the topic Categorical Regression Sae on page 15 for more information. The results of a preious run with multiple systematic starts allows you to Use fixed signs for the regression coefficients. The signs (indicated by 1 and 1) need to be in a row of the specified dataset or file. The integer-alued starting number is the case number of the row in this file that contains the signs to be used. Criteria. You can specify the maximum number of iterations that the regression may go through in its computations. You can also select a conergence criterion alue. The regression stops iterating if the difference in total fit between the last two iterations is less than the conergence alue or if the maximum number of iterations is reached. Label Plots By. Allows you to specify whether ariables and alue labels or ariable names and alues will be used in the plots. You can also specify a maximum length for labels. Chapter 2. Categorical Regression (CATREG) 13

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

IBM SPSS Custom Tables 22

IBM SPSS Custom Tables 22 IBM SPSS Custom Tables 22 Note Before using this information and the product it supports, read the information in Notices on page 97. Product Information This edition applies to ersion 22, release 0, modification

More information

IBM SPSS Statistics 23 Brief Guide

IBM SPSS Statistics 23 Brief Guide IBM SPSS Statistics 23 Brief Guide Note Before using this information and the product it supports, read the information in Notices on page 87. Product Information This edition applies to ersion 23, release

More information

IBM SPSS Advanced Statistics 22

IBM SPSS Advanced Statistics 22 IBM SPSS Adanced Statistics 22 Note Before using this information and the product it supports, read the information in Notices on page 103. Product Information This edition applies to ersion 22, release

More information

IBM SPSS Statistics Base 22

IBM SPSS Statistics Base 22 IBM SPSS Statistics Base 22 Note Before using this information and the product it supports, read the information in Notices on page 179. Product Information This edition applies to ersion 22, release 0,

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012. Integration Guide

IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012. Integration Guide IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012 Integration Guide Note Before using this information and the product it supports, read the information in Notices on page 51.

More information

IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

Lecture 39: Intro to Differential Amplifiers. Context

Lecture 39: Intro to Differential Amplifiers. Context Lecture 39: Intro to Differential Amplifiers Prof J. S. Smith Context Next week is the last week of lecture, and we will spend those three lectures reiewing the material of the course, and looking at applications

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

IBM SPSS Data Preparation 22

IBM SPSS Data Preparation 22 IBM SPSS Data Preparation 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

PASW Direct Marketing 18

PASW Direct Marketing 18 i PASW Direct Marketing 18 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412

More information

IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

More information

Key Stage 3 Mathematics Programme of Study

Key Stage 3 Mathematics Programme of Study Deeloping numerical reasoning Identify processes and connections Represent and communicate Reiew transfer mathematical across the curriculum in a ariety of contexts and eeryday situations select, trial

More information

IBM SmartCloud Monitoring - Application Insight. User Interface Help SC27-5618-01

IBM SmartCloud Monitoring - Application Insight. User Interface Help SC27-5618-01 IBM SmartCloud Monitoring - Application Insight User Interface Help SC27-5618-01 IBM SmartCloud Monitoring - Application Insight User Interface Help SC27-5618-01 ii IBM SmartCloud Monitoring - Application

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

ESSENTIAL S&OP SOFTWARE REQUIREMENTS

ESSENTIAL S&OP SOFTWARE REQUIREMENTS ESSENTIAL S&OP SOFTWARE REQUIREMENTS Essential S&OP Software Requirements 1 Your Forecast Essential S&OP Software Requirements By Harpal Singh Software Support for the S&OP Process The focus of S&OP is

More information

Key Stage 2 Mathematics Programme of Study

Key Stage 2 Mathematics Programme of Study Deeloping numerical reasoning Identify processes and connections Represent and communicate Reiew transfer mathematical to a ariety of contexts and eeryday situations identify the appropriate steps and

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Collecting Evidence this should involve engagement with equality groups.

Collecting Evidence this should involve engagement with equality groups. FULL EQUALITY IMPACT ASSESSMENT PROCESS - Part 2 If you do need to carry out an EQIA this is the process: You should aim to carry out the Impact Assessment at the beginning of the planning of a new policy

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Group Theory and Molecular Symmetry

Group Theory and Molecular Symmetry Group Theory and Molecular Symmetry Molecular Symmetry Symmetry Elements and perations Identity element E - Apply E to object and nothing happens. bject is unmoed. Rotation axis C n - Rotation of object

More information

IBM SPSS Missing Values 20

IBM SPSS Missing Values 20 IBM SPSS Missing Values 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 20 and to all

More information

R&D White Paper WHP 031. Reed-Solomon error correction. Research & Development BRITISH BROADCASTING CORPORATION. July 2002. C.K.P.

R&D White Paper WHP 031. Reed-Solomon error correction. Research & Development BRITISH BROADCASTING CORPORATION. July 2002. C.K.P. R&D White Paper WHP 03 July 00 Reed-olomon error correction C.K.P. Clarke Research & Deelopment BRITIH BROADCATING CORPORATION BBC Research & Deelopment White Paper WHP 03 Reed-olomon Error Correction

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

ERserver. iseries. Digital certificate management

ERserver. iseries. Digital certificate management ERserer iseries Digital certificate management ERserer iseries Digital certificate management ii iseries: Digital certificate management Contents Part 1. Digital certificate management.....................

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

How To Model The Effectiveness Of A Curriculum In Online Education Using Bayesian Networks

How To Model The Effectiveness Of A Curriculum In Online Education Using Bayesian Networks International Journal of adanced studies in Computer Science and Engineering Modelling the Effectieness of Curriculum in Educational Systems Using Bayesian Networks Ahmad A. Kardan Department of Computer

More information

SPSS Introduction. Yi Li

SPSS Introduction. Yi Li SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

SPSS Modules Features Statistics Premium

SPSS Modules Features Statistics Premium SPSS Modules Features Statistics Premium Core System Functionality (included in every license) Data access and management Data Prep features: Define Variable properties tool; copy data properties tool,

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

S P S S Statistical Package for the Social Sciences

S P S S Statistical Package for the Social Sciences S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July

More information

Instructions for SPSS 21

Instructions for SPSS 21 1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

More information

Orthogonal Projection in Teaching Regression and Financial Mathematics

Orthogonal Projection in Teaching Regression and Financial Mathematics Journal of Statistics Education Volume 8 Number () Orthogonal Projection in Teaching Regression and Financial Mathematics Farida Kachapoa Auckland Uniersity of Technology Ilias Kachapo Journal of Statistics

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

IBM Campaign Version 9 Release 1.1 February 18, 2015. User's Guide

IBM Campaign Version 9 Release 1.1 February 18, 2015. User's Guide IBM Campaign Version 9 Release 1.1 February 18, 2015 User's Guide Note Before using this information and the product it supports, read the information in Notices on page 245. This edition applies to ersion

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

IBM Marketing Operations OnDemand November 17, 2014. Project Manager's Guide

IBM Marketing Operations OnDemand November 17, 2014. Project Manager's Guide IBM Marketing Operations OnDemand Noember 17, 2014 Project Manager's Guide Note Before using this information and the product it supports, read the information in Notices on page 63. IBM Marketing Operations

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

IBM Unica Marketing Platform Version 8 Release 5 June 1, 2012. Administrator's Guide

IBM Unica Marketing Platform Version 8 Release 5 June 1, 2012. Administrator's Guide IBM Unica Marketing Platform Version 8 Release 5 June 1, 2012 Administrator's Guide Note Before using this information and the product it supports, read the information in Notices on page 449. This edition

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

AS/400e. Digital Certificate Management

AS/400e. Digital Certificate Management AS/400e Digital Certificate Management AS/400e Digital Certificate Management ii AS/400e: Digital Certificate Management Contents Part 1. Digital Certificate Management............ 1 Chapter 1. Print

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

LINES AND PLANES IN R 3

LINES AND PLANES IN R 3 LINES AND PLANES IN R 3 In this handout we will summarize the properties of the dot product and cross product and use them to present arious descriptions of lines and planes in three dimensional space.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

How To Calculate The Price Of A Motel

How To Calculate The Price Of A Motel Competitie Outcomes in Product-Differentiated Oligopoly Michael J. Mazzeo * Abstract This paper analyzes price and quantity outcomes of firms operating in differentiated product oligopoly markets. Unlike

More information

Chapter 7 Factor Analysis SPSS

Chapter 7 Factor Analysis SPSS Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3

C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3 Installing the IDB Analyzer (Version 3.1) Installing the IDB Analyzer (Version 3.1) A current version of the IDB Analyzer is available free of charge from the IEA website (http://www.iea.nl/data.html,

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

ERserver. iseries. Service tools

ERserver. iseries. Service tools ERserer iseries Serice tools ERserer iseries Serice tools Copyright International Business Machines Corporation 2002. All rights resered. US Goernment Users Restricted Rights Use, duplication or disclosure

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

There are six different windows that can be opened when using SPSS. The following will give a description of each of them. SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet

More information

the points are called control points approximating curve

the points are called control points approximating curve Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.

More information

IBM Tivoli Enterprise Console. Rule Set Reference SC32-1282-00

IBM Tivoli Enterprise Console. Rule Set Reference SC32-1282-00 IBM Tioli Enterprise Console Rule Set Reference SC32-1282-00 IBM Tioli Enterprise Console Rule Set Reference SC32-1282-00 Note Before using this information and the product it supports, read the information

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information