FACTOR ANALYSIS NASC

Similar documents

Chapter 7 Factor Analysis SPSS

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

A Brief Introduction to SPSS Factor Analysis

Factor Analysis. Chapter 420. Introduction

Exploratory Factor Analysis

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

T-test & factor analysis

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

5.2 Customers Types for Grocery Shopping Scenario

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

Common factor analysis

Factor Analysis. Factor Analysis

Multivariate Analysis (Slides 13)

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

Factor Analysis Using SPSS

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Exploratory Factor Analysis

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

STA 4107/5107. Chapter 3

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

DATA ANALYSIS AND INTERPRETATION OF EMPLOYEES PERSPECTIVES ON HIGH ATTRITION

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Factor Analysis. Sample StatFolio: factor analysis.sgp

Introduction to Principal Components and FactorAnalysis

APPRAISAL OF FINANCIAL AND ADMINISTRATIVE FUNCTIONING OF PUNJAB TECHNICAL UNIVERSITY

Research Methodology: Tools

A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis

Overview of Factor Analysis

Exploratory Factor Analysis

Factor Analysis Using SPSS

Principal Component Analysis

Statistics for Business Decision Making

Topic 10: Factor Analysis

Least-Squares Intersection of Lines

Multivariate Analysis

Psychology 7291, Multivariate Analysis, Spring SAS PROC FACTOR: Suggestions on Use

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

The Effectiveness of Ethics Program among Malaysian Companies

LIST OF TABLES. 4.3 The frequency distribution of employee s opinion about training functions emphasizes the development of managerial competencies

Canonical Correlation Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

An introduction to. Principal Component Analysis & Factor Analysis. Using SPSS 19 and R (psych package) Robin Beaumont robin@organplayers.co.

Factor analysis. Angela Montanari

Data analysis process

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Least Squares Estimation

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Principal Component Analysis

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Does organizational culture cheer organizational profitability? A case study on a Bangalore based Software Company

II. DISTRIBUTIONS distribution normal distribution. standard scores

Pull and Push Factors of Migration: A Case Study in the Urban Area of Monywa Township, Myanmar

Practical Considerations for Using Exploratory Factor Analysis in Educational Research

Simple Linear Regression Inference

PRINCIPAL COMPONENT ANALYSIS

Introduction to Matrix Algebra

Association Between Variables

Using Principal Components Analysis in Program Evaluation: Some Practical Considerations

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Multiple Regression: What Is It?

Part 2: Analysis of Relationship Between Two Variables

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistical Machine Learning

9.2 User s Guide SAS/STAT. The FACTOR Procedure. (Book Excerpt) SAS Documentation

Additional sources Compilation of sources:

Multivariate Analysis of Variance (MANOVA)

CALCULATIONS & STATISTICS

Dimensionality Reduction: Principal Components Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

A Brief Introduction to Factor Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

How to Get More Value from Your Survey Data

CHAPTER-III CUSTOMER RELATIONSHIP MANAGEMENT (CRM) AT COMMERCIAL BANKS. performance of the commercial banks. The implementation of the CRM consists

Part III. Item-Level Analysis

Lecture Notes Module 1

Factor Analysis - 2 nd TUTORIAL

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

THE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE

How To Run Factor Analysis

Statistical tests for SPSS

Data Analysis Tools. Tools for Summarizing Data

The Effect of Macroeconomic Factors on Indian Stock Market Performance: A Factor Analysis Approach

Vector and Matrix Norms

Factors affecting teaching and learning of computer disciplines at. Rajamangala University of Technology

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Module 3: Correlation and Covariance

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

UNDERSTANDING THE TWO-WAY ANOVA

Example: Boats and Manatees

DISCRIMINANT FUNCTION ANALYSIS (DA)

Transcription:

FACTOR ANALYSIS NASC

Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively homogeneous. Groups of related variables are called factors.

Purposes The main applications of factor analytic techniques are: (1) to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify variables.

Conceptual Model for a Factor Analysis with a Simple Model Factor 1 Factor 2 Factor 3 e.g., 12 items testing might actually tap only 3 underlying factors

Conceptual Model for Factor Analysis (with cross-loadings)

Common Factor Model It is suggested that X 1, X 2, and X 3 are functions of two underlying factors, F 1 and F 2. It is assumed that each X variables are linearly related to the two factors as shown in the following model. X 1 = β 11 F 1 + β 12 F 2 + e 1 X 2 = β 21 F 1 + β 22 F 2 + e 2 X 3 = β 31 F 1 + β 32 F 2 + e 3 The error terms e 1, e 2, and e 3, serve to indicate that the hypothesized relationships are not exact. In the vocabulary of factor analysis, the parameters β ij are referred to asfactorloadings. For example, β 12 is the factor loading of variable X 1 on factor F 2.

Expected Structure of Loadings It is expected that the loadings have roughly the structure as shown in the table. Loading (F1) β i1 Loading (F2) β i2 X1 + 0 X2 0 + X3 0 + Of course, the zeros in the preceding table are not expected to be exactly equal to zero. By `0' we mean approximately equal to zero and by `+' a positive number substantially different from zero.

Model Assumptions A1: The error terms e i are independent of one another and E(e i ) = 0 and Var(e i ) = σ i2, A2:The unobservable factors are independent of one another. It is also assumed that the factors and error terms are independent. As for the factor means and variances, the assumption is that the factors are standardized: E(F j ) = 0 and Var(F j ) = 1. It is an assumption made for convenience; since the factors are unobservable, we might as well think of them as measured in standardized form.

Implications of Assumptions The variance of X i from the model can be expressed as Var(X i ) = Var(F 1 ) + Var(F 2 )+ Var(e i ) = + + We see that the variance of X i consists of two parts: ( + ) and. The first part is called communality of the variable. It is the part of Var(X i ) explained by the common factors F 1 and F 2. The second part is called specific variance of the variable. It is the part of Var(X i ) unable to explain by the common factors. The covariance of any two observable variables, X i and X j, from the model can be expressed as Cov(X i, X j ) = β i1 β j1 Var(F 1 )+ β i2 β j2 Var(F 2 ) = β i1 β j1 +β i2 β j2

History of Factor Analysis? Invented by Spearman (1904) Usage hampered by onerousness of hand calculation Since the advent of computers, usage has thrived, esp. to develop: Theory e.g., determining the structure of personality Practice e.g., development of 10,000s+ of psychological screening and measurement tests

Assumption Testing Factorability It is important to check the factorability of the correlation matrix (i.e., how suitable is the data for factor analysis?) Check correlation matrix for correlations Check the anti-image matrix for diagonals Check measures of sampling adequacy (MSAs) Bartlett s KMO

Rule of thumb: Measures of Sampling Adequacy Are there several correlations over.3? Are the diagonals of anti-image matrix >.5? Is Bartlett s test significant? Is KMO >.5?

Assumption Testing Factorability (Correlation and partial correlation) Medium effort, reasonably accurate Examine the diagonals on the anti-image correlation matrix to assess the sampling adequacy of each variable Variables with diagonal anti-image correlations of less that.5 should be excluded from the analysis they lack sufficient correlation with other variables

Assumption Testing Factorability (Bartlett s and KMO measure) Sampling Adequacy predicts whether the data you have collected are likely to "factor well" based on correlation and partial correlation and this is measured by the Kaiser- Meyer-Olkin (KMO) statistic Quickest method, but least reliable Global diagnostic indicators - correlation matrix is factorable if: Bartlett s test of sphericity is significant and/or (Null: no correlation among the variables(unit R matrix) Kaiser-Mayer Olkin (KMO) measure of sampling adequacy >.5

Communalities The proportion of variance in each variable which can be explained by the factors Also called the explained variation due to factor. Communalities range between 0 and 1 High communalities (>.5) show that the factors extracted explain most of the variance in the variables being analysed. Low communalities (<.5) mean there is considerable variance unexplained by the factors extracted

Eigen Values EV = sum of squared correlations for each factor EV = overall strength of relationship between a factor and the variables Successive EVs have lower values Eigen values over 1 are stable

Explained Variance A good factor solution is one that explains the most variance with the fewest factors Realistically happy with 50-75% of the variance explained

Example: interpreting the communality Variable (1) Variance (2) Loadings of F 1 (3) Loadings of F 2 (4) Communality (5) % explained (6) = 100 (5)/(2) Finance 1,0000.0299.9995 0.9999 99.9910 Marketing 1.0000.9941 -.0815 0.9949 99.4940 Policy 1.0000.9961.0514 0.9949 99.4920 Overall 3.0000 1.9815 1.0083 2.9898 99.6590 The loadings on F 1 are relatively large for marketing and policy but close to zero for finance. On the contrary, the loadings on F 2 are relatively large for finance but relatively low for marketing and policy. This solution supports the expectation. F 1 could be interpreted as verbal ability, and F 2 as quantitative ability.

Assessment of the First Solution based on R The communalities show that the factor model explains nearly 100%, 99.5%, and 99.5% respectively of the observed variance of finance, marketing and policy grades. Overall, the two factors explain 99.65% of the sum of all observed variances. The sum of squared loadings on F 1 can be interpreted as the contribution of F 1, and that on F 2 as the contribution of F 2 in explaining the sum of the observed variances. In our example F 1 explains about 1.9815/3 or 66%, and F 2 about 33.7% of the sum of the observed variances. Theoretically, the sum of squared loadings, 1.9815, is the largest eigenvalue of R and the loadings on F1 constitute the corresponding eigenvector. the sum of squared loadings, 1.0083, is the second largest eigenvalue of R and the loadings on F2 constitute the corresponding eigenvector.

How Many Factors? A subjective process. Seek to explain maximum variance using fewest factors, considering: 1. Theory what is predicted/expected? 2. Eigen Values > 1? (Kaiser s criterion) 3. Scree Plot where does it drop off? 4. Factors must be able to be meaningfully interpreted & make theoretical sense?

Cattell & Jaspers (1967) suggest that the number of factors be taken as the number of eigenvalues immediately before the straight line begins.

Scree Plot A bar graph of Eigen Values Depicts the amount of variance explained by each factor. Look for point where additional factors fail to add appreciably to the cumulative explained variance. 1st factor explains the most variance Last factor explains the least amount of variance

Factor Rotation Factor loadings are not unique. There exist an infinite sets of factor loadings yielding the same theoretical dispersion matrix. The process of obtaining a new set of loadings with some specific objective is called factor rotation. Orthogonal (Varimax) Oblimin

Factor loading stages In practice, FA can be carried out in two stages. In the first stage, one set of loadings is estimated. These loadings may not agree with the prior expectations, or may not lend themselves to a reasonable interpretation. In the second stage, the first set of factor loadings are "rotated" in an effort to arrive at another set that are more consistent with prior expectations or more easily interpretable. variables with cross-loading shall be omitted from the further analysis.

How do I eliminate items? A subjective process, but consider: Size of main loading (min=.5) Size of cross loadings (max=.3?) Eliminate 1 variable at a time, then re-run, before deciding which/if any items to eliminate next Number of items already in the factor More items in a factor -> greater reliability Minimum = 3 Maximum = unlimited

Factor Analysis: an example suppose that an automobile company asked a large number of questions about different vehicles. Consider how the different items (features) might be more parsimoniously represented by just a few constructs (factors). - Ideally, interval data (e.g., a rating on a k- point scale), regarding the perceptions of consumers are required regarding a number of features

Cumulative percent of variance explained. We are looking for an eigenvalue above 1.0.

Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust

What shall these components be called? Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust

EXCLUSIVE TRENDY RELIABLE Expensive Exciting Luxury Distinctive Not Conservative Not Family Not Basic Appeals to Others Attractive Looking Trend Setting Reliable Latest Features Trust

Calculate Component Scores(summated score) EXCLUSIVE = (Expensive + Exciting + Luxury + Distinctive Conservative Family Basic)/7 TRENDY = (Appeals to Others + Attractive Looking + Trend Setting)/3 RELIABLE = (Reliable + Latest Features + Trust)/3

Exclusive Trendy Reliable Beetle 1.4 6.7 6.9 Hummer 3.9 6.2 6.7 Lotus 4.1 7.3 6.7 Minivan -1.67 4.83 6.5 Pick-Up -0.43 4.93 6.3 Not much differing on this dimension.

Exclusive Trendy Reliable Beetle 1.4 6.7 6.9 Hummer 3.9 6.2 6.7 Lotus 4.1 7.3 6.7 Minivan -1.67 4.83 6.5 Pick-Up -0.43 4.93 6.3

Practical session : using SPSS Step 1: Open the data file, for example, Example.SAV Step 2: Click on sequentially: Analyze Data Reduction Factor. Step 3: Move the three variables X1, X2 & X3 - from Source to Variable box

Step 4: Click on Descriptives. Activate Coefficients Significance levels KMO and Bartlett s test of sphericity Anti-image Click on Continue. This will produce correlation matrix and significance of correlations, sampling adequacy and test of sphericity. Step 5: Click on Extraction. Activate Correlation Matrix Unrotated factor solution Eigenvalues greater than 1 Click on Continue. This will produce loadings from correlation matrix and the number of factors is same as the number of eigenvalues greater than 1.

Step 6: Click on Rotation. Activate Varimax Rotated Solution Click on Continue Step 7: Click on OK SPSS will produce 8 tables as outputs with table titles 1. Correlation Matrix 2. KMO & Bartlett s Test 3. Anti-image Matrices 4. Communalities 5. Total Variance Explained 6. Component Matrix 7. Rotated Component Matrix 8. Component Transformation Matrix

Composite Factor Values Frequently, FA is not an end in itself but an intermediate step on the way to further analysis of the data. In such case we may require the composite values of each factor based on original/standardized data. are generated through three techniques. In recent years, the composite values Surrogated variables (A surrogated variable of a factor is a single variable that has the highest factor loading) Summated scale (The values of several variables defining a factor are summed and their total or average scores are considered) Factor scores (computer generated scores available under Scores of the main FA procedure). There are three methods Regression, Bartlett, and Anderson-Rubin.

Advantages & Disadvantages of the Techniques Surrogate Variables Factor Scores Summated Scales Source: Hair et al Advantages Simple to administer and interpret Represent all variables through loadings Best method for complete data reduction By default orthogonal Compromise between the surrogate variable and factor score options Reduce measurement error Represent multiple facets of a concept Disadvantages Does not represent all facets of a factor Prone to measurement error Interpretation more difficult because all variables contribute through loadings Include only the variables that load highly on the factor and exclude those having little or marginal impact Not necessarily orthogonal Require extensive analysis of reliability and validity

Judging Practical Significance of FA In interpreting factors, a decision must be made regarding the factor loadings. A factor loading is the correlation of the variable and the factor, the squared loading is the amount of the variable's total variation accounted for by the factor. Thus, a 0.3 loading translates to 9 per cent explanation; and a 0.5 loading denotes that 25% of the variation is accounted for by the factor. The loading must exceed 0.7 for the factor to account for 50% of the variation of the variable. Thus larger the absolute size of the factor loading, the more improvement the loading in interpreting the factor matrix using the practical significance as the criteria, we can assess the loadings as follows. Factor loadings in the range of ±0.3 to ±0.4 are considered to meet the minimal level for interpretation of structure Absolute value of loading 0.5 or greater are considered practically significant Absolute value of loading 0.7 or greater are considered indicative of well-defined structure and are the goal of any FA

Some Relations Among Output Values A number of relations exist among outputs, which help us to understand and interpret outputs better. The major relations are the followings when input matrix is p p correlation matrix. 1. Sum of all eigenvalues = p = total variance of p standardized variables. 2. Sum of squared factor loadings for the j th factor =λ j = j th largest eigenvalue 3. λ j = amount of variance the j th factor explains 4. λ j /p = proportion of variance explained by the j th factor 5. Sum of squared factor loadings for the i th variable = i th communality 6. i th communality = proportion of the variance of the i th standardized variable explained by the common factor model 7. (i, j) th factor loading is the correlation between the i th variable and the j th factor