Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Transcription

1 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

2 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings and Communalities Properties of the Model Rotation Factor Scores Tutorial, guidelines and rules of thumb 2

3 Factor Analysis The aim of Factor Analysis is to find hidden (latent) variables which explain the correlation coefficients of the variables observed. Examples: A firm s image The sales aptitude of salespersons Performance of the firm Resistance to high-tech innovations 3

4 When to use FA? Exploratory / descriptive analysis: Learning the internal structure of the dataset: What are the main dimensions of the data? Helps to visualize multivariate data in lower-dimensional pictures Data reduction technique If you have a high-dimensional dataset, then is there a way to capture the essential information using a smaller number of variables? Can be also considered as a preprocessing step for techniques which are sensitive to multi-collinearity (e.g. regression) 4

5 Two Approaches Initially Factor structure to explain the correlations among variables is searched without any a priori theory What are the underlying processes that could have produced correlations among the variables? Note: there are no readily available criteria against which to test the solution Confirmatory Factor Analysis Factor structure is assumed to be known or hypothesized a priori Are the correlations among variables consistent with a hypothesized factor structure? Often performed through structural equations modeling 5

6 Part I: Basic concepts and examples 6

7 History and Example Originally developed by Spearman (1904) to explain student performance in various courses. Suppose the students test scores (M: Mathematics, P: Physics, C: Chemistry, E: English, H: History and F: French.) depends on The student s general intelligence I and The student s aptitude for a given course 7

8 Illustration of Factor Analysis Observed Variable Intelligence Latent Variable (Common Factor) Unique Factor Math. Physics Chem. English History French A M A P A C A E A H A F 8

9 Example: 1-factor model For example, as follows: M =.80I + A m P =.70I + A p C =.90I + A c E =.60I + A e H =.50I + A h F =.65I + A f, A m, A p, A c, A e, A h, and A f are standing for special aptitude (Specific Factors). The coefficients:.8,.7,.9,.6,.5, and.65 are called Factor Loadings. The variables M, P, C, E, H, and F are indicators or measures of I (Common Factor). 9

10 Assumptions: Means of the variables (indicators), the common factor I, and the unique factors are zero Variances of the variables (indicators), and the common factor I, are one Correlations between the common factor I and the unique factors are zero, and Correlations among the unique factors are zero 10

11 From Assumptions Variances of Variables (Ex. M) Var(M) = Var(0.8I + A m ) = Var(I) + Var(A m ) = Var(A m ) Covariance of Variables (Ex. M and H) Cov(M, H) = Cov( (0.8I + A m ), (0.5I + A h ) ) = 0.8*0.5Cov(I, I) + 0.8Cov(I, A h ) + 0.5Cov(A m, I) + Cov(A m, A h ) = 0.8*0.5 = 0.40 Generally, the variables are standardized in factor analysis Var(M) & Cov = Corr 11

12 Variance Decomposition The total variance of any indicator is decomposed into two components Variance in common with I (Communality of the indicator) Variance in common with the specific factor Example: Var(M) = Var(A m ) 12

13 Variance decomposition (cont.) Source: Hair et al. (2010) Diagonal Value Unity (1) Variance Total Variance Communality Common Specific and Error Variance extracted Variance not used Hair et al. (2010): Multivariate Data Analysis, Pearson Education 13

14 Example: 2-Factor Model Factor loadings Common factors Specific factors 14

15 Example: 2-Factor Model M =.80Q +.20V + A m P =.70Q +.30V + A p C =.60Q +.30V + A c E =.20Q +.80V + A e H =.15Q +.82V + A h F =.25Q +.85V + A f 15

16 Covariances for 2-Factor Model Variances of Variables (Ex. M) Var(M) = Var(0.8Q + 0.2V + A m ) = Var(Q) Var(V) + Var(A m ) = Var(A m ) Covariance of Variables (Ex. M and H) Cov(M, H) = = Cov((0.8Q + 0.2V + A m ), (0.15Q V + A m )) = 0.8*0.15Cov(Q, Q) + 0.2*0.82Cov(V, V) = 0.8* *0.82 =

17 Part II: Factor Model 17

18 Objectives of Factor Analysis To identify the smallest number of common factors that best explain the correlations among variables To estimate loadings and communalities To identify, via rotation, the most plausible factor structure To estimate factor scores, when desired 18

19 Some Theory: Factor Model Consider a p-dimensional random vector: x (µ, Σ). An m-factor model: x = Λf + ε + µ, where Λ = Λ (p,m) is a matrix of factor loadings, and f = f (m,1) and ε = ε (p,1) are random vectors. The elements of vector f are common factors and the elements of ε are unique factors. 19

20 Factor Equation Factor Equation: Σ = ΛΛ' + Ψ, where Σ is the covariance matrix of the variables X Λ is the loading matrix, Ψ is a diagonal matrix containing the unique variances 20

21 Factor Equation (cont.) The communalities: Σ Ψ The covariances (correlations) between the variables and the factors is given by: E((x - µ) f T ) = E((Λf + ε )f T ) = ΛE( f f T ) + E(ε f T ) = Λ 21

22 Interpretation of the Common Factors Loadings the covariances between the loadings and the factors Eigenvalues the variance explained by each factor 22

23 Factor Indeterminacy Factor indeterminacy due to rotation Factor indeterminacy due to the estimation of the communality problem 23

24 Factor Indeterminacy (cont.) Factor indeterminacy due to rotation Consider M =.667Q.484V + A m P =.680Q.343V + A p C =.615Q.267V + A c E =.741Q +.361V + A e These alternative loadings provide the same total communalities and uniquenesses as the previously presented solution H =.725Q +.412V + A h F =.812Q +.355V + A f Even correlation matrices are identical (Note that two common factors are assumed to be uncorrelated.) 24

25 Factor Indeterminacy (cont.) Factor indeterminacy due to the estimation of communality problem To estimate Loadings, the communalities are needed, and To estimate communalities, the loadings are needed (!) 25

26 Factor Analysis Techniques (Tabachnik and Fidell: Using Multivariate Statistics) Principal Component Factoring (PCF) The initial estimates of the communalities for all variables are equal to one (= Principal Component Analysis) Principal Axis Factoring (PAF) Principal components analyze total variance, whereas FA analyzes covariance (communality) An attempt is made to estimate the communalities: Explain each variable with the other variables and use the multiple determination as an initial estimate for communality Find the communalities through an iterative process 26

27 Image Factor Extraction Uses correlation matrix of predicted variables, where each variable is predicted using others via multiple regression A compromise between PCA and principal axis factoring Like PCA provides a mathematically unique solution because there are fixed values in the positive diagonal Like PAF, the values in the diagonal are communalities with unique error variability excluded Loadings represent covariances between variables and factors rather than correlations Maximum Likelihood Factor Extraction Population estimates for factor loadings are calculated which have the greatest probability of yielding a sample with the observed correlation matrix 27

28 Unweighted Least Squares Factoring Minimizes squared differences between the observed and reproduced correlation matrices Only off-diagonal differences considered, communalities are derived from solution rather than estimated as a part Special case of principal factors, where communalities are estimated after the solution Generalized (Weighted) Least Squares Factoring Variables that have substantial shared variance with other variables get higher weights than variables with large unique variance Alpha Factoring Interest is in discovering which common factors are found consistently when repeated samples of variables are taken from a population of variables 28

29 Summary of extraction procedures Technique Goal of analysis Special features Principal components Maximize variance extracted by orthogonal components Principal factors Image factoring Maximize variance extracted by orthogonal factors Provides an empirical factor analysis Source: Tabachnik and Fidell: Using Multivariate Statistics Mathematically determined, solution mixes common, unique, and error variance into components Estimates communalities to attempt to eliminate unique and error variance from variables Uses variances based on multiple regression of a variable with other variables as communalities 29

30 Summary of extraction procedures (cont.) Technique Goal of analysis Special features Maximum likelihood factoring Alpha factoring Unweighted least squares Generalized least squares Estimate factor loadings for population that maximize the likelihood of sampling the observed correlation matrix. Maximize the generalizability of orthogonal factors Minimize squared residual correlations Weights variables by shared variance before minimizing squared residual correlations Source: Tabachnik and Fidell: Using Multivariate Statistics Has significance test for factors; useful for confirmatory factor analysis 30

31 Part III: Rotations 31

32 Two Classes of Rotational Approaches Orthogonal = axes are maintained at 90 degrees. Oblique = axes are not maintained at 90 degrees. 32

33 Orthogonal Factor Rotation Source: Hair et al. (2010) Unrotated Factor II +1.0 Rotated Factor II V V Unrotated Factor I -.50 V 5 V 4 V 3 Rotated Factor I -1.0

34 Oblique Factor Rotation Source: Hair et al. (2010) Unrotated Factor II +1.0 Orthogonal Rotation: Factor II V 1 Oblique Rotation: Factor II +.50 V Unrotated Factor I V 5 V 4 V 3 Oblique Rotation: Factor I Orthogonal Rotation: Factor I

35 Orthogonal Rotation Identify an orthogonal transformation matrix C such that: Λ* = ΛC and Σ = Λ*Λ* + Ψ, where C T C = I Remember the connection to factor indeterminacy problem (!) 35

36 Varimax Rotation Find a factor structure in which each variable loads highly on one and only one factor (i.e. to simplify columns of the loading matrix) That is, for any given factor, is the variance of the communalities of the variables within factor j Total variance: 36

37 Varimax Rotation (cont.) Find the orthogonal matrix C such that it maximizes V, which is equivalent to maximizing subject to the constraint that the communality of each variable remains the same. 37

38 Quartimax Rotation Purpose: To simplify rows of the loading matrix, i.e. to obtain a pattern of loadings such that: All the variables have a fairly high loading on one factor Each variable should have a high loading on one other factor and near zero loadings on the remaining factors The quartimax rotation will be most appropriate in the presence of the general factor 38

39 Quartimax Rotation (cont.) For any variable i, the variance of communalities (i.e. square of the loadings) is given by Then the total variance of all the variables is 39

40 Quartimax Rotation (cont.) Quartimax rotation is obtained by finding the orthogonal matrix C such that Q max. This problem can be reduced into the following form: Subject to the condition that the communality of each variable remains the same Varimax is often preferred over quartimax, since it leads to cleaner separation of factors and tends to be more invariant when a different subset of variables is analyzed 40

41 Oblique Rotations The factors are allowed to be correlated: oblique rotations offer a continuous range of correlations between factors The degree of correlation between factors is determined by the delta-variable δ: δ = 0 : solutions fairly highly correlated δ < 0 : solutions are increasingly orthogonal at about -4 solution is orthogonal δ ~ 1 : leads to very highly correlated solutions Note: Although delta affects size of correlation, maximum correlation at a given value depends on the dataset 41

42 Commonly Used Oblique Rotations Promax: Orthogonal factors rotated to oblique position (orthogonal loadings are raised to powers (usually 2, 4 or 6) to drive small and moderate loadings to zero while larger loadings are reduced Direct Oblimin: Simplifies factors by minimizing sum of cross-products of squared loadings in pattern matrix Values of δ > 0 produce highly correlated factors à careful consideration needed when deciding the number of factors (!) Helps to cope with situations encountered in practice? Note: factor loadings obtained after oblique rotations no longer represent correlations between factors and observed variables 42

43 Terminology in Oblique Rotations Factor correlation matrix = Correlations between factors (standardized factor scores) after rotation Pattern matrix = Regression-like weights representing the unique contribution of each factor to the variance in the variable (comparable to loadings matrix when having orthogonal factors) Structure matrix = Correlations between variables and correlated factors (given by the product of the pattern matrix and the factor correlation matrix) 43

44 Methods for Obtaining Factor Scores Thomson s (1951) regression estimates Assumes the factor scores to be random The assumption is appropriate when we are interested in the general structure (different samples consisting of different individuals) Bartlett s estimates Assumes the factor scores to be deterministic Assumes normality and that loadings and uniquenesses are known Anderson-Rubin estimates No clear favorite method, each has its advantages and disadvantages 44

45 Factor Scores via Multiple Regression Estimate: E{f ij } = β 1j x i1 + + β pj x ip In matrix form: E{F} = XB and for standardized variables: F = ZB Hence (n-1) -1 Z F = (n-1) -1 Z ZB Λ = RB B = R -1 Λ Since (n-1) -1 Z F = Λ and (n-1) -1 Z Z = R 45

46 FA in Practice Analysis of the HBAT s consumer survey results Form groups of 1 to 3 people Tutorial 46

47 Conceptual Issues Basic assumption is that an underlying structure exists in the set of variables Presence of correlated variables and detected factors do not guarantee relevance, even if statistical requirements are met Ensuring conceptual validity remains the responsibility of the researcher Remember: Do not mix dependent and independent variables in a single factor analysis, if the objective is to study dependence relationships using derived factors 47

48 Conceptual Issues (cont.) Ensure that the sample is homogeneous with respect to the underlying factor structure If the sample has multiple internal groups with unique characteristics, it may be inappropriate to apply factor analysis on the pooled data If different groups are expected, separate factor analysis should be performed for each group Compare group specific results to the combined sample 48

49 Sample Size and Missing Data Correlations estimated from small samples tend to be less reliable Minimum sample size should be 50 observations Sample must have more observations than variables Strive to maximize the number of obs / variable (desired ratio is 5:1) Recommendations by Comrey and Lee (1992): Sample size 100 = poor, 200 = fair, 300 = good, 500 = very good Missing values: If cases are missing values in a nonrandom pattern or if sample size is too small, estimation is needed Beware of using estimation procedures (e.g. regression) that are likely to overfit data and cause correlations to be too high 49

50 Factorability of Correlation Matrix A factorable correlation matrix should include several sizeable correlations (e.g. Bartlett s test of sphericity) If no correlation exceeds.30, use of FA is questionable Warning: high bivariate correlations do not necessarily ensure existence of factors à Examine partial correlations or anti-image correlations (negatives of partial correlations) If factors are present, high bivariate correlations become very low partial correlations 50

51 Partial Correlation Partial correlation between variables X and Y given a set of n controlling variables Z = {Z 1,,Z n } is the correlation coefficient ρ X,Y Z, where relatedness due to controlling variables is taken into account In practice: partial correlation is computed as bivariate correlation between residuals from linear regressions of X ~ Z and Y ~ Z X ρ 2 X,Y Z = a / (a+d) Y d a b c Z 51

52 Geometrical Interpretation of Partial Correlation Residuals from regressions Source: Wikipedia 52

53 Other Practical Issues Normality When FA is used in exploratory manner to summarize relationships, assumptions on distributions are not in force Normality enhances solution (but is not necessary) Linearity Multivariate normality implies linear relationships between pairs of variables Analysis is degraded when linearity fails (note: correlation measures linear relationship) Absence of multicollinearity and singularity Some degree of multicollinearity is desirable but extreme multicollinearity or singularity is an issue (check for eigenvalues close to zero or zero determinant of correlation matrix) 53

54 Outliers Among Cases and Variables Screening for outliers among cases Factor solution may be sensitive to outlying cases Screening for outliers among variables A variable with low squared multiple correlation with all other variables and low correlation with all important factors is an outlier among the variables Outlying variables are often ignored in current FA or the researcher may consider adding more related variables in a further study Note: Factors defined by just one or two variables are not stable (or real ). If the variance accounted by such factor is high enough, it may be interpreted with caution or ignored. 54

55 Choosing and Evaluating a Solution Number and nature of factors How many reliable and interpretable factors are there in the data set? What is the meaning of the factors? How are they interpreted? Importance of solutions / factors How much variance in a dataset is accounted for by the factors? Which factors account for the most variance? Testing theory in FA How well does the obtained solution fit an expected factor solution? Estimating scores on factors How do the subjects score on the factors? 55

56 Appendix: 56

57 Preliminary Considerations Assume that Population mean: is a vector of p random variables Population covariance: Correlation between i-th and j-th variable: 57

58 Preliminary Considerations Variance of a linear combination of p many variables Generalized Variance: Σ Total Variation: tr Σ 58

59 Preliminary Considerations Illustration: Combining Uncorrelated Variables a 1 a 1 a = a a a 2 2 = 1, because r 12 = 0 2 s 2 2 = 1 a 2 r 12 = 0, a 12 +a 22 =1 a 1 s 1 2 = 1 59

60 Preliminary Considerations Illustration: Combining Correlated Variables & a % 1 a 2 $ " # 1 r 12! r $ %& 21 1 " "#! a $ %%& = a a a 1 a 2 r 12 > 1, when r 12 > 0 2 # "! a 1 s 2 2 = 1 a 2 r 12 > 0, a 12 +a 22 =1 a 1 s 1 2 = 1 60

61 Some Theory: Factor Model Consider a p-dimensional random vector: x (µ, Σ). An m-factor model: x = Λf + ε + µ, where Λ = Λ (p,m) is a matrix of factor loadings, and f = f (m, 1) and ε = ε (p,1) are random vectors. The elements of vector f are common factors and the elements of ε are unique factors. 61

62 Assumptions E(f) = 0 & Cov(f) = I E(ε) = 0 & Cov(ε i ε j ) = 0, i j Cov(f, ε) = 0 Cov(ε) = ψ = diag(ψ 11, ψ 22,, ψ pp ) Thus Σ = E((x - µ) (x - µ) T ) = E((Λf + ε)(λf + ε) T ) = E(Λf(Λf) T ) + E(ε ε T ) + E(Λfε T ) + E(ε(Λf) T ) = ΛE(f f T )Λ T + E(ε ε T ) + ΛE(f ε T ) + E(ε f T )Λ T = ΛΛ T + ψ 62

63 Label Name Size Description Λ Factor loading matrix (or pattern matrix in oblique methods) p x m Matrix of regression-like weights used to estimate the unique contribution of each factor to the variance in a variable x Vector of variables p x 1 Observed random variables Σ Covariance or correlation matrix p x p Covariances or correlations between variables µ Expected values of variables p x 1 Expected values of observed random variables f Common factors m x 1 Vector of common factors ε Unique factors p x 1 Vector of variable specific unique factors Ψ Covariance of unique factors p x p Covariance matrix for unique factors C Rotation matrix m x m Transformation matrix to produce rotated loading matrix 63

64 Factor Equation Factor Equation: Σ = ΛΛ' + Ψ, where Σ is the covariance matrix of the variables X Λ is the loading matrix, Ψ is a diagonal matrix containing the unique variances 64

65 Factor Equation (cont.) The communalities: Σ Ψ The covariances (correlations) between the variables and the factors is given by: E((x - µ) f T ) = E((Λf + ε )f T ) = ΛE( f f T ) + E(ε f T ) = Λ 65

66 Solving the Factor Equation How to solve? Σ Ψ = ΛΛ T Use Spectral Decomposition Theorem: Any symmetric matrix A (p,p) can be written as A = ΓΘΓ T, where Θ is a diagonal matrix of eigenvalues of A, and Γ is an orthogonal matrix whose columns are standardized eigenvectors. 66

67 Therefore Provided Ψ is known, we may have: Σ - Ψ = ΓΘΓ T = (ΓΘ 1/2 )( Θ 1/2 Γ T ) Assume the first k eigenvalues θ i > 0, i = 1, 2,, k, then we may write λ i = (θ i )1/2 γ i Thus Λ = Γ 1 Θ 1 1/2, where Γ 1 is k k 67

68 Thank you! 68