Sample size in factor analysis: why size matters. Helen C Lingard and Steve Rowlinson

Similar documents
Exploratory Factor Analysis

Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis

Replication Analysis in Exploratory Factor Analysis: What it is and why it makes your analysis better

How to report the percentage of explained common variance in exploratory factor analysis

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

T-test & factor analysis

Chapter 1 Introduction. 1.1 Introduction

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Practical Considerations for Using Exploratory Factor Analysis in Educational Research

Using Principal Components Analysis in Program Evaluation: Some Practical Considerations

Factorial Invariance in Student Ratings of Instruction

What is Rotating in Exploratory Factor Analysis?

Choosing the Right Type of Rotation in PCA and EFA James Dean Brown (University of Hawai i at Manoa)

Elements to Consider in Planning the Use of Factor Analysis

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

Exploratory Factor Analysis With Small Sample Sizes

A Brief Introduction to Factor Analysis

Understanding Power and Rules of Thumb for Determining Sample Sizes

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?

Applications of Structural Equation Modeling in Social Sciences Research

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Multivariate Analysis (Slides 13)

An introduction to Value-at-Risk Learning Curve September 2003

A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Topic 10: Factor Analysis

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS

Overview of Factor Analysis

A Comparison of Variable Selection Techniques for Credit Scoring

Factor Analysis - SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

DATA ANALYSIS AND INTERPRETATION OF EMPLOYEES PERSPECTIVES ON HIGH ATTRITION

Common factor analysis versus principal component analysis: a comparison of loadings by means of simulations

Factor Analysis. Chapter 420. Introduction

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS NASC

1 Annex 11: Market failure in broadcasting

Validation of the Core Self-Evaluations Scale research instrument in the conditions of Slovak Republic

MEASURING INFORMATION QUALITY OF WEB SITES: DEVELOPMENT OF AN INSTRUMENT

RESEARCH METHODS IN I/O PSYCHOLOGY

UK GDP is the best predictor of UK GDP, literally.

Multivariate Analysis of Variance (MANOVA)

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Exploring Graduates Perceptions of the Quality of Higher Education

5.2 Customers Types for Grocery Shopping Scenario

PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES

Association Between Variables

Assessment of Online Learning Environments: Using the OCLES(20) with Graduate Level Online Classes

Module 3: Correlation and Covariance

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

What Are Principal Components Analysis and Exploratory Factor Analysis?

Factors affecting teaching and learning of computer disciplines at. Rajamangala University of Technology

Structural Equation Modelling (SEM)

Principal Component Analysis

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

Introduction to Data Analysis in Hierarchical Linear Models

Emotionally unstable? It spells trouble for work, relationships and life

THE ACT INTEREST INVENTORY AND THE WORLD-OF-WORK MAP

ASSESSING UNIDIMENSIONALITY OF PSYCHOLOGICAL SCALES: USING INDIVIDUAL AND INTEGRATIVE CRITERIA FROM FACTOR ANALYSIS. SUZANNE LYNN SLOCUM

Sample Size and Power in Clinical Trials

AN IMPROVED CREDIT SCORING METHOD FOR CHINESE COMMERCIAL BANKS

Financial capability and saving: Evidence from the British Household Panel Survey

Research of Female Consumer Behavior in Cosmetics Market Case Study of Female Consumers in Hsinchu Area Taiwan

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

Correlational Research

PRINCIPAL COMPONENT ANALYSIS

RESEARCH METHODS IN I/O PSYCHOLOGY

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Evaluation: Designs and Approaches

Normal distribution. ) 2 /2σ. 2π σ

Canonical Correlation Analysis

Exploratory Factor Analysis; Concepts and Theory

Equity Risk Premium Article Michael Annin, CFA and Dominic Falaschetti, CFA

Part III. Item-Level Analysis

A REVIEW OF SCALE DEVELOPMENT PRACTICES IN NONPROFIT MANAGEMENT AND MARKETING

COMMUNICATION SATISFACTION IN THE HOSPITALITY INDUSTRY: A CASE STUDY OF EMPLOYEES AT A THEME PARK IN CHINA ABSTRACT

Additional sources Compilation of sources:

Beef Demand: What is Driving the Market?

Journal Impact Factor, Eigenfactor, Journal Influence and Article Influence

Advanced mediating effects tests, multi-group analyses, and measurement model assessments in PLS-based SEM

Exploratory Factor Analysis

Factor Rotations in Factor Analyses.

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Internet Blog Usage and Political Participation Ryan Reed, University of California, Davis

Descriptive Statistics and Measurement Scales

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

UNDERSTANDING THE TWO-WAY ANOVA

Centre d études sur les médias and Journal of Media Economics. HEC Montréal, Montréal, Canada May 12-15, 2004

Interobserver Agreement in Behavioral Research: Importance and Calculation

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Behavioral Entropy of a Cellular Phone User

Crime and cohesive communities

Local outlier detection in data forensics: data mining approach to flag unusual schools

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Report on the Ontario Principals Council Leadership Study

. new ideas are made use of, or used, in a. . solutions are extensive in their. . the impact of solutions extends to

Introduction to Principal Components and FactorAnalysis

Transcription:

Sample size in factor analysis: why size matters Helen C Lingard and Steve Rowlinson Abstract Factor analysis is a powerful and often-used technique in construction management and real estate research. Although the process is relatively straightforward there are certain rules in relation to data and sample size which must be considered in the analysis. Small samples and low N:p ratios can lead to eroneous conclusions being drawn and the strength of the data should be considered in such circumstances The use of factor analysis should be carefully justified in order that research can be considered to be rigorous, replicable and of high quality Introduction Factor analysis is one of the most commonly used methods for data reduction in social science research. Factor analysis assumes that underlying dimensions or factors can be used to explain complex phenomena. The goal of factor analysis is to identify notdirectly-observable factors based on a larger set of observable or measurable indicators (variables). Norusis (1993) describes the process of factor analysis as follows: 1. The first step in factor analysis is to produce a correlation matrix for all variables. Variables that do not appear to be related to other variables can be identified from this matrix. 2. The number of factors necessary to represent the data and the method for calculating them must then be determined. Principal component analysis 1 (PCA) is the most widely used method of extracting factors. In PCA, linear combinations of variables are formed. The first principal component is that which accounts for the largest amount of variance in the sample, the second principal component is that which accounts for the next largest amount of variance and is uncorrelated with the first and so on. In order to ascertain how well the model (the factor structure) fits the data, coefficients called factor loadings that relate variables to identified factors, are calculated. 3. Factor models are then often rotated to ensure that each factor has non-zero loadings for only some of the variables. Rotation makes the factor matrix more interpretable. 4. Following rotation, scores for each factor can be computed for each case in a sample. These scores are often used in further data analysis. 1 Factor analysis is used to discover patterns in the relationships amongst variables and enables reduction of the number of variables into factors combined from these variables. Principal component analysis (PCA) is a statistical technique which is used to replace a large set of variables by a smaller set of variables which is the best representation of the larger set. PCA is the most commonly used method for extracting factors in factor analysis. The critique presented in this note applies to PCA but should generalize to other extraction methods used in factor analysis.

For it to be robust, factor analysis requires the factor pattern to be stable, i.e. the solution can be re-produced in different samples and accurately produce the true population structure. This is a very important issue and the reason for this note. In recent years, the number of examples of factor analysis in construction management research has grown. However, in the social sciences there is continued discussion about how large a sample is needed for meaningful factor analysis and the issue is by no means resolved. Now, as it is based upon correlation matrices, factor analysis is subject to sampling error associated with small samples. Hence, it is important that the construction management research community understand the importance of sample size when using factor analysis so as to prevent researchers from drawing erroneous conclusions as a result of failing to recognize the problems associated with using factor analysis in small samples. In this note we highlight some of the problems associated with using factor analysis in small sample studies and present an analysis of sample sizes in research published in construction management journals, in which researchers have utilized factor analysis. In doing so we draw attention to the fact that, in many instances, construction management researchers use factor analysis without giving due consideration to the question of whether their samples are sufficiently large to yield meaningful results. Lastly, we make some recommendations for the design and planning of construction management research to ensure factor analysis is used appropriately. Sample size recommendations A wide range of recommendations regarding sample size in factor analysis have been made. These are usually stated in terms of either the minimum sample size (N) for a particular analysis or the minimum ratio of N to the number of variables, p i.e. the number of survey items being subjected to factor analysis (MacCallum et al 1999). Gorsuch (1983) recommended five subjects per item, with a minimum of 100 subjects, regardless of the number of items. Guilford (1954) argued that N should be at least 200, while Cattell (1978) recommended three to six subjects per item, with a minimum of 250. Comrey and Lee (1992) provided the following guidance in determining the adequacy of sample size: 100= poor, 200 = fair, 300 = good, 500 = very good, 1,000 or more = excellent. More demanding recommendations for sample size require a minimum of 10 subjects per item (Everitt 1975) or just a large sample, ideally several hundred (Cureton & D Agostino, 1983). Before going further, it is useful to discuss the effect of size. Problems arising as result of small samples in factor analysis Small samples present problems due to various forms of sampling error, which can manifest itself in factors that are specific to one data set. This bias limits the extent to which data is representative of a larger population and generates factor structures which elude replication. These rogue factors can, for example, occur as a result of unique patterns of responding to a single survey question. Another problem associated with small samples in factor analysis is the splintering of factors into smaller groupings of items that really constitute a larger factor. Costello and Osborne (2005) empirically tested the effect of sample size on the results of factor analysis reporting that larger samples

tend to produce more accurate solutions. Only 10% of samples with the smallest N:p ratios (2:1) produced correct solutions. A solution was deemed to be correct if it was identical to the solution derived from the total population. In contrast, 70% of the samples with the largest N:p ratio (20:1) produced correct solutions. Costello and Osborne (2005) also report that the number of misclassified items was also significantly affected by the size of a sample. In the smallest samples, almost two out of thirteen items on average were misclassified (i.e. found to belong to the wrong factor). Lastly, Costello and Osborne (2005) report that two extreme problems in factor analysis, i.e. the Heywood effect (in which the impossible outcome of factor loadings greater than 1.0 emerge) and the failure to produce a solution, were only observed in small samples. The failure to produce a solution occurred in almost one third of analyses in the smallest sample size category. The problems associated with rogue factors, splintered factors and/or misclassified items usually only become evident when data collected from a sufficiently large and representative sample is factor analysed. However, in many cases these problems are either never discovered or are only discovered once an initial factor analysis has produced misleading results. MacCallum et al (1999) suggest that increasing the sample size is one means of overcoming these problems. They argue that, as the sample size increases, sampling error is reduced, factor analysis solutions become more stable and more reliably produce the factorial structure of the population (MacCallum et al 1999). Method A search for research studies that reported using some form of factor analysis or principal components analysis in the construction management literature between 2000 and 2005 was performed. Journals searched and search terms are provided in Appendix 1. A total of 31 published articles were identified. Only studies in which the number of subjects and the number of items were analyzed (31 in total). For each study, the subject to item ratio (N:p) was calculated. Results are presented in Table 1. The N:p ratio was used rather than the absolute sample size because Osborne and Costello (2004) report that N:p ratio is consistent predictor of stability in factor structures, the occurrence of Type 1 errors and the correctness of factor structures. Further, Osborne and Costello (2004) report a relative lack of unique impact of the absolute number of subjects (N) after the N:p ratio was accounted for. Table 1 indicates that nearly 60% of the studies had an N:p ratio of less than 5 and Table 2 indicates that 70% of studies had N less than 100.

Table 1: Current practice in factor analysis in real estate & construction management research Subject to item ratio % of studies Cumulative % No. of articles 2:1 or less 35.48 35.48 11 >2:1 5:1 22.58 58.06 7 >5:1 10:1 29.03 87.09 9 >10:1 20:1 9.68 96.77 3 >20:1 100:1 3.23 100.00 1 >100:1 0.00 100.00 0 Table 2: Sample size and the number of articles reviewed N = sample size No. of articles % of studies Cumulative % 30 or less 1 3.23 3.23 31 to 60 8 25.81 29.04 61 to 99 13 41.93 70.97 100 to 200 6 19.35 90.32 201 to 300 2 6.45 96.77 301 or above 1 3.23 100.00 Total 31 Discussion The widely varying rules of thumb relating to sample size in factor analysis present a problem for researchers who want simple and definitive guidelines about how big a sample must be to produce meaningful factor analysis results. It is fair to say that no absolute rules can exist. MacCallum et al (1999) suggest that definitive recommendations regarding sample size in factor analysis are based upon the misconception that the minimum sample or N:p ratio for meaningful factor analysis is invariant across studies. Rather, MacCallum and his colleagues suggest that the minimum sample size depends upon the nature of the data itself, most notably its strength. Strong data is data in which item communalities 2 are consistently high (in the order of.80 or above), factors exhibit high loadings on a substantial number of items (at least three or four) and the number of factors is small. Empirical evidence supports the argument that sample size is less important where data are sufficiently strong. For example, in an empirical analysis of data originally published by Guadagnoli and Velicer (1988), Osborne and Costello (2004) found that sample size had less of an impact in factor analysis when there were fewer variables (items) and that both N and N:p had a larger effect on the goodness of a factor analysis 2 Communalities explain the amount of variance accounted for by each factor

when item loadings were small. Similarly, MacCallum et al (1999) report that, when data are strong, the impact of sample size is greatly reduced. Under these conditions, MacCallum et al (1999) conclude that factor analysis can produce correct solutions, even with samples that would traditionally have been determined to be too small for meaningful factor analysis. However, one caveat to this assertion is that, as Costello and Osborne (2005) note, uniformly high item communalities are unlikely to occur in real data and that more common magnitudes in social science research are in the order of.40 to.70. As communalities become lower, the size of the sample has a greater impact upon factor analysis outcomes. Also, when dealing with empirical data, it is rare to observe item loadings of 0 or.60. In social science research, moderate and weak item loadings ranging from.30 to.50 are the norm. Thus, in construction management research, it would be rare for data to be of sufficient strength to justify the use of factor analysis in small samples. Conclusions The general implication of this note is that construction management researchers need to be more conscious of the impact of sample size when using factor analysis. Our analysis reveals that researchers in the construction management discipline frequently apply factor analysis to small sample datasets, without considering the consequences. The likely result is that frustrating, confusing and misleading results emerge and erroneous conclusions are drawn. It is critically important that the construction management research community becomes mindful of when factor analysis should be used and under which circumstances it is permissible to use factor analysis in small samples. While definitive rules of thumb for sample size in factor analysis are probably inappropriate, construction management researchers need to carefully consider expectations about the strength of their data when determining the size of their sample. Datasets with large numbers of variables (i.e survey questions) and/or the expectation that large number of factors will emerge should be avoided unless an extremely large sample is likely to be achieved. Conversely, the use of factor analysis in small samples must be carefully considered and explicitly defended in terms of the strength of the data. References Cattell, R. B. (1978), The Scientific Use of Factor Analysis. New York: Plenum. Comrey, A. L. and Lee, H. B., (1992), A first course in factor analysis, Hillsdale, New Jersey: Erlbaum. Costello, A. B. & Osborne, J. W., (2005), Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis, Practical Assessment, Research & Evaluation, 10, (7). http://pareonline.net/getvn.asp?v=10&n=7 Cureton, E. E. & D'Agostino, R. B. (1983). Factor Analysis: An Applied Approach. Hillsdale, NJ: Erlbaum. Everitt, B. S., (1975), Multivariate analysis: the need for data and other problems, British Journal of Psychiatry, 126, 237-240. Gorsuch, R. L. (1983), Factor Analysis (2nd. Ed). Hillsdale, NJ: Erlbaum. Guadagnoli, E. & Velicer, W. F., (1988), Relation of sample size to the stability of component patterns, Psychological Bulletin, 103, 265-275. Guilford, J. P., (1954), Psychometric methods, 2 nd edition, New York: McGraw Hill. MacCallum, R. C., Widaman, K. F., Zhang, S. & Hong, S., (1999), Sample size in factor analysis, Psychological Methods, 4, 84-99.

Osborne, J. W. and Costello, A. B., (2004), Sample size and subject to item ratio in principal components analysis, Practical Assessment, Research & Evaluation, 9 (11) http://pareonline.net/getvn.asp?v=9&n=11 Velicer, W. F. & Fava, J. L., (1985), Effects of variable and subject sampling on factor pattern recovery, Psychological Methods, 3, 231-251. Acknowledgement: The authors wish to recognise the contributions made in the production of this note by Yip, L.P.B., Barima, O., Tuuli M.M.and Koh T.Y. of Dept REC, HKU Appendix 1: Journals searched and search terms Journals searched: Construction Management & Economics (16 articles) Journal of Construction Engineering & Management (5 articles) Journal of Professional Issues in Engineering Education & Practice (1 article) Journal of Management in Engineering (2 articles) Engineering, Construction, & Architectural Management (6 articles) International Journal of Service Industry Management (1 article) Structural Survey (2 articles) International Journal of Quality & Reliability Management (1 article) Journal of Property Investment & Finance (1 article) Engineering, Construction & Architectural Management (1 article) HKIE Transactions (1 article) (Total 31 articles) Keyword: Factor analysis, principal components analysis, construction. Years searched: 2000 to 2005.