1 Applications of Intermediate/Advanced Statistics in Institutional Research Edited by Mary Ann Coughlin THE ASSOCIATION FOR INSTITUTIONAL RESEARCH Number Sixteen Resources in Institional Research
2 2005 Association for Institutional Research 222 Stone Building Florida State University Tallahassee, FL All Rights Reserved No portion of this book may be reproduced by any process, stored in a retrieval system, or transmitted in any form, or by any means, without the express written premission of the publisher Printed in the United States ISBN
3 Dedication The authors of this monograph dedicate this work to the memory of Julia Duckwall. Julia touched the lives of so many Institutional Research professionals with her spirit and dedication to the profession. We hope that this monograph serves as a valued reference for many and that the work and spirit of Julia will live on.
4 Table of Contents Chapter 1  Nonparametric Statistics: Applications in Institutional Research 1 Institutional Research Context 2 Chapter Organization 3 Looking at the Data: Determining When to Use Nonparametric Statistics 5 Properties of Nonparametric Statistics 5 Measurement Scales 5 Hypothesis Testing 6 Tests for Normality 7 Nonparametric Tests: Descriptions and Examples 9 Tests of Location: One Sample 9 Tests of Location: Two Independent Samples 16 Tests of Location: Two Related Samples 21 Tests of Location: Three or More Independent Samples 27 Tests of Location: Three or More Related Samples 29 Goodness of Fit: One Sample 33 Goodness of Fit: Two Independent Samples 37 Measures of Association: Two Variables 38 Measures of Association: Three or More Variables 43 Beyond Nonparametrics: Some Advanced Topics 46 OLAP 46 LogLinear Analysis 47 Multidimensional Scaling 47 Resampling Processes 47 Other Considerations When Using Nonparametric Statistical Tools 48 About the Central Limit Theorem and Law of Large Numbers 48 About Ordinal Data and Ties References 50 Chapter 2  Analysis of Variance Applications in Institutional Research 51 Statistical & Theoretical Background for ANOVA 52 Two General Types of ANOVA: Independent vs Repeated Measures Designs 54 OneWay ANOVA: The Simplest Independent Measures ANOVA Design 54 TwoFactor Independent Measures ANOVA Designs 58 ThreeFactor Independent Measures ANOVA Designs 66 Repeated Measures ANOVA Designs 69 Using SingleFactor RepeatedMeasures ANOVA Where Time is the RMFactor 72 Caveate: Using Apriori Contrasts with Repeated Measures Designs.. 73 A8
5 Using SingleFactor RepeatedMeasures ANOVA Where Condition is the RMFactor 76 Factorial RepeatedMeasures ANOVA Designs 78 MixedModel ANOVA Designs 79 Using Covariates in Factorial ANOVA Designs 82 Presenting Results of ANOVA Models 85 Summary Remarks 86 References 88 Endnotes 88 Chapter 3  Regression Analysis for Institutional Research 89 Basic Regression Model 90 Assumptions in Regression Analysis 91 Uses of Regression Analysis 93 Hypothesis Testing 93 GoodnessofFit Measures 95 Deriving Predictions 96 Common Variable Transformations 96 Dichotomous Variables 97 NonLinear Variable Transformations 97 Application 1: High School Graduate Projections 101 Application 2: Faculty Salary Studies 103 Summary 108 Suggested Readings 109 Chapter 4  What Can Multilevel Models Add to Institutional Research? 110 OLS versus Multilevel Models 111 Theoretical Background 112 The Random Intercept Model 113 The Random Coefficient Model 116 Summary 118 Applied Modeling Considerations 119 Data Requirements 119 Intraclass Correlation: Proportion of Variance between Groups 120 Building Models and Randomizing Coefficients 121 Measures of Variance Explained 122 Case Study:Student Engagement Across Institutions 123 Conclusion 128 Additional Resources 128 References 130 Endnotes 131 Chapter 5  Identifying and Analyzing Group Differences 132 Analyzing Differences among Existing Groups 133 The Common Aims of Discriminant Analysis and Logistic Regression 134 ii
6 The Common Aims of Discriminant Analysis and Logistic Regression..134 Typical Institutional Research Questions Addressed by These Techniques 135 Discriminant Analysis 136 Discriminant Analysis References 143 Institutional Research Applications 143 General References 143 Logistic Regression 143 Logistic Regression References 147 Institutional Research Applications 147 General References 148 Choosing between Discriminant Analysis and Logistic Regression 148 General References 149 Identifying Groups within Previously Undifferentiated Populations 149 Selecting Variables 150 Choosing a Distance/Similarity Measure 150 Generating a Proximity Matrix 152 Choosing a Clustering Technique 153 Cluster Analysis Examples 154 Cluster Analysis References 159 General References 159 Decision Trees 160 CHAID Decision Tree Example 162 Decision Tree References 165 Institutional Research Applications 165 General References 167 Choosing between Cluster Analysis and Decision Trees 167 Summary and Conclusions 167 Endnotes 168 Chapter6Applied Multivariate Statistics 169 Path Analysis 169 Statistical and Theoretical Background 169 Case Study: An Examination of Performance in Graduate School 171 Factor Analysis 180 Statistical and Theoretical Background 180 Case II: Annual Survey of Graduating Students: Outcomes of an Undergraduate Education 183 Introduction to Structural Equation Modeling 203 Statistical and Theoretical Background 204 Case III: Confirming the Factor Structure from the Annual Survey of Graduating Students 207 References 214 iii
7 Introduction In March of 1999, the Association for Institutional Research offered the first Applied Statistics Institute. The Professional Development Services Committee in conjunction with the Publications Committee undertook the development of the current Resources in Institutional Research monograph. The goal of this document is to provide a resource for institutional research professionals concerning the application of intermediate/advanced statistics in institutional research settings, as well as provide a resource document to participants attending the Applied Statistics Institute. As a result, the curriculum of the Applied Statistics Institute has served as the basis for the content of this monograph. The Institute offers five specialized modules. Each module provides a theoretical context with practical applications, exercises, and interpretive and presentation techniques for each statistical approach. The five modules focus on: nonparametric statistics, regression analysis, analysis of variance, identifying and analyzing group difference, multilevel models, and multivariate statistics. As a result, each chapter is authored by the faculty member who has described these applications and techniques. Additional data sets and exercises will be made available on the AIR Web site The focus of this monograph is not to cover each statistical area in depth; rather it is to describe the theory and application of these procedures to institutional research settings. As a result, the reader should be familiar with basic statistical principles and applications. In addition, the reader may need to refer to supplemental readings provided within each chapter to more fully understand each statistical application. Similar to the learning objectives of the Applied Statistics Institute, the goal of this monograph is to educate the reader about: uses of nonparametric statistics for common assessment activities; applications of regression techniques to higher education problems and issues; uses of ANOVA for rating scale data, student performance data, and other IR data; applications of techniques for identifying groups and determining how groups differ; uses of advanced statistics to provide evidence of institutional effectiveness; and applications of multilevel modeling techniques to common institutional research questions. ENJOY and consider joining us for an upcoming Applied Statistics Institute. Mary Ann Coughlin Editor IV
8 Chapter 1 Nonparametric Statistics: Applications in Institutional Research Richard Howard Gerald McLaughlin Josetta McLaughlin The statistical tests and procedures outlined in other chapters of this book in general are known as "parametric tests." Those statistical procedures require the researcher to assume that the population from which data are collected reflect a normal distribution and, assuming that the sample is representative of the population, also reflect the properties of a normal distribution. While the actual distribution of the sample may not be exactly normal, it is considered "close enough" in most cases, and the problem under study is modeled using the assumptions and probabilities that define the normal distribution. In this way, the use of Parametric Statistics results in exact solutions to approximate problems (Conover, 1971). Computationally, parametric tests require: (1) sample sizes greater than 30 observations; (2) data that reflect the properties of interval or ratio measurement scales; and (3) specific data about each observation. While appropriate for many research projects, parametric statistics do not serve the needs of researchers whose data sets fail to meet the criteria noted above or whose data sets are small due to the nature of the project. During the 1930s, statisticians proposed alternative procedures that did not rely on the assumptions required to use parametric statistics. The resultant statistical tests, known as nonparametric tests, are not dependent on the normal distribution to define desired probabilities but use other distributions or close approximations. These tests allow the researcher to model the problem under study. In many cases, they are easier to apply in that less computational work is required. As such, Nonparametric Statistics results in approximate solutions to exact problems (Conover, 1971). Computationally, these tests: (1) are not dependent on large numbers of observations; (2) use data that reflect in most cases the properties of nominal or ordinal measurement scales; and (3) are frequently used to analyze summarized or categorical count data. The choice of whether to use a parametric test or nonparametric test is dictated by the characteristics of the data as described above. Be aware, however, that nonparametric tests are not in general as sensitive as parametric tests. In other words, parametric tests are more likely than their nonparametric counterparts to detect a statistically significant difference between two or more treatments or a significant relationship between two variables. When 1
9 faced with a situation where the data will allow you to choose between the use of a parametric test and a nonparametric test, the parametric test is the recommended option (Gravetter & Wallnau, 2004; Zar, 1984). Institutional Research Context Often the statistical problems that face institutional researchers include situations involving large numbers of observations such as the student population or the institution's faculty and staff. In these cases, the use of parametric approaches to studying the problems are usually appropriate as the underlying distributions are typically "close enough" to normal to provide reliable information. However, it is also often the case that the data reflect one or more of the following characteristics making the use of parametric analyses inappropriate or impossible: small sample size; data summarized into categories; a nonnormal or unknown distribution; and/or nominal or ordinal measurement scales. Institutional researchers are often faced with situations when the data they are working with originate in reports (paper and Webbased) in which the data are summarized in categories, such as disciplines, students by rank, or by institutions. In other cases, the unit of analysis might be the department, college, or program in which comparison information between six to ten comparators is the intent of the analysis. The normal distribution does not model the data, especially when there are not enough observations to invoke the assumptions of normality and parametric tests are not appropriate. Nonparametric tests were designed specifically to address these situations. In this chapter, a number of nonparametric tests are presented with examples reflecting "typical" questions that might be asked of an institutional research office. The primary and traditional nonparametric tests included are those that have the following characteristics: (1) standard procedures exist to compute them and (2) they are included in the SPSS procedures. Many of these tests have large sample equations, but we do not present those formulae in this chapter. They can be found in basic nonparametric texts such as The Handbook of Parametric and Nonparametric Statistical Procedures, by D. J. Sheskin, In addition, we discuss some fairly new and advanced techniques. Some, such as loglinear analysis, are statistical tests. Others, such as Bootstrapping, can lead to statistical statements. Finally, some of the techniques, such as Data Mining, do not tend to make probabilistic statements but are more an extension of Exploratory Data Analysis techniques. Individuals interested in learning more about these tools are referred to the references at the end of the chapter.
10 Chapter Organization The format of this chapter is somewhat different from the others in this book. In general, the other authors dealt with a specific family of tests, i.e. Analysis of Variance or Regression. In contrast, we present a number of tests that have three fundamental purposes  tests of "location", tests for "goodness of fit", and tests of "association". The only common parameter is that the tests do not rely on assumptions associated with normal or other distributions. In most introductory statistics and social science research texts, nonparametric statistical tests are discussed in terms of this assumption, and only the most common of the nonparametric tests are presented (Chi Square, MannWhiteny U, Sperman Rho, etc.). There are a number of situations where less common nonparametric tests can be used to provide the statistical evidence to support an institutional position, the evaluation of a policy, or the effects of a process or procedure. Obviously, the scope of this book does not allow us to present all nonparametric tests that might be appropriate for examining institutional data. Nevertheless, our intent is to provide an overview of selected tests with examples of how they can be used to answer typical questions posed to an institutional research office. The order of the chapter is as follows: First we present a basic methodology for testing the assumption that the data to be analyzed reflects a normal distribution. Next we present a series of nonparametric tests appropriate for use with those cases where a normal distribution is not assumed, the scale is not interval or ratio, or other reasons exist that support using a nonparametric test. For each test, we indicate the purpose of the test, assumptions about the data, the hypothesis to be tested, an example of using the test in an institutional research context, and the SPSS procedure and output. The specific tests are presented and organized according to the purpose of the test and the number of samples. The tests described in the chapter are summarized in Tables 1,2, and 3 based on purpose, scale, and the number of samples. For each of the nonparametric tests presented in the tables, a data set has been developed and can be accessed through the Association for Institutional Research Web site (http://airweb.org). The data sets specific to particular tests are identified in the discussion of each test. The intent is that the reader should be able to access a particular data set and run the test as described in the chapter. This will allow the researcher to practice setting up the SPSS procedure and to then compare the outcome with that presented in the chapter. If the same output is obtained as shown in the chapter, then the researcher will be ready to analyze the data set of concern. Finally, we discuss some advanced methodologies and concerns.
11 Table 1 Nonparametric Tests of Location No. of Samples Scale One Sample Two Samples Independent Two Samples Related Three or more Samples Independent Three or more Samples Related Nominal Binomial Test McNemar Test Cochran Q Runs Test Ordinal Sign Test Median Test MannWhitney U Test Sign Test for Two Dependent Samples Median Test KruskalWallis ANOVA by Ranks Friedman Twoway ANOVA Interval Wilcoxon Signedranks Test Wilcoxon Matched Pairs Signedranks Test Parametric Equivalent One Sample ttest Two Sample ttest Paired ttest ANOVA Within Subjects ANOVA Table 2 Nonparametric Analysis for Goodness of Fit* No. of Samples Scale Nominal One Sample One Sample Chi Square Two Samples Chi Square Test of Independence Ordinal One Sample KolmogorovSmirnov Two Sample KolmogorovSmirnov *no generally comparable Parametric techniques Table 3 Nonparametric Analysis for Association No. of Variables Scale Nominal Two Variables Phi Coefficient (2x2) Three or more Variables LogLinear (not include) Point Biserial (2xLinear) Ordinal Parametric Equivalent Chi Square Test of Independence Spearman Rho Pearson Correlation Kendall's Coefficient of Concordance W Eta Squared
12 Looking at the Data: Determining When to Use Nonparametric Statistics Properties of Nonparametric Statistics As indicated above, nonparametric statistical tools are not based on the properties of the normal distribution. Because they do not require that assumptions be made about the normality of the sampled population, the term distributionfree test is sometimes applied to these statistical tools (Zar,1984,p.138). Nonparametric statistics are those which have one or more of the following properties: The data are count data that enumerate the number of observations having some characteristic or belonging to a specific group. The data are measured and/or analyzed using a nominal scale or ordinal scale. The inference does not concern a parameter in the population. The probability distribution of the statistic on which the analysis is based is not dependent upon specific information or assumptions about the population from which the sample(s) is drawn, but only on general assumptions such as being continuous and/or symmetric (See Sheskin, 1997; Zar, 1984; Gibbons, 1971). Measurement Scales To know when it is appropriate to use a nonparametric test, the researcher must understand the level of measurement used to measure the characteristic of interest. As a quick review, we briefly define each of the four measurement scales commonly employed nominal, ordinal, interval, and ratio. For a more detailed discussion of the four measurement scales, refer to any introductory statistics or social science research text (Gravetter & Wallnau, 2004; Hinkle, Wiersma, & Jurs, 1998; Gay & Airasian, 2003). Nominal: When the variable is classified on the basis of some quality rather than on a numerical basis, the level of measurement is nominal. An example would be a student's major. When using nominal data, the researcher generally counts the number of observations in each category. This level of measurement is sometimes referred to as categorical. Ordinal: When data reflect relative differences rather than quantitative differences and can be ranked, the level of measurement
