Factor Analysis - Introduction

Transcription

1 Factor Analysis - Introduction Factor analysis is used to draw inferences on unobservable quantities such as intelligence, musical ability, patriotism, consumer attitudes, that cannot be measured directly. The goal of factor analysis is to describe correlations between p measured traits in terms of variation in few underlying and unobservable factors. Changes across subjects in the value of one or more unobserved factors could affect the values of an entire subset of measured traits and cause them to be highly correlated. 654

2 Factor Analysis - Example A marketing firm wishes to determine how consumers choose to patronize certain stores. Customers at various stores were asked to complete a survey with about p = 80 questions. Marketing researchers postulate that consumer choices are based on a few underlying factors such as: friendliness of personnel, level of customer service, store atmosphere, product assortment, product quality and general price level. A factor analysis would use correlations among responses to the 80 questions to determine if they can be grouped into six sub-groups that reflect variation in the six postulated factors. 655

3 Orthogonal Factor Model X = (X 1, X 2,..., X p ) is a p dimensional vector of observable traits distributed with mean vector µ and covariance matrix Σ. The factor model postulates that X can be written as a linear combination of a set of m common factors F 1, F 2,..., F m and p additional unique factors ɛ 1, ɛ 2,..., ɛ p, so that X 1 µ 1 = l 11 F 1 + l 12 F l 1m F m + ɛ 1 X 2 µ 2 = l 21 F 1 + l 22 F l 2m F m + ɛ 2,.. X p µ p = l p1 F 1 + l p2 F l pm F m + ɛ p where l ij is called a factor loading or the loading of the ith trait on the jth factor. 656

4 Orthogonal Factor Model In matrix notation: (X µ) p 1 = L p m F m 1 + ɛ p 1, where L is the matrix of factor loadings and F is the vector of values for the m unobservable common factors. Notice that the model looks very much like an ordinary linear model. Since we do not observe anything on the right hand side, however, we cannot do anything with this model unless we impose some more structure. The orthogonal factor model assumes that E(F) = 0, V ar(f) = E(FF ) = I, E(ɛ) = 0, V ar(ɛ) = E(ɛɛ ) = Ψ = diag{ψ i }, i = 1,..., p, and F, ɛ are independent, so that Cov(F, ɛ) =

5 Orthogonal Factor Model Assuming that the variances of the factors are all one is not a restriction, as it can be achieved by properly scaling the factor loadings. Assuming that the common factors are uncorrelated and the unique factors are uncorrelated are the defining restrictions of the orthoginal factor model. The assumptions of the orthogonal factor model have implications for the structure of Σ. If then it follows that (X µ) p 1 = L p m F m 1 + ɛ p 1, (X µ)(x µ) = (LF + ɛ)(lf + ɛ) = (LF + ɛ)((lf) + ɛ ) = LFF L + ɛf L + LFɛ + ɛɛ. 658

6 Orthogonal Factor Model Taking expectations of both sides of the equation we find that: Σ = E(X µ)(x µ) = LE(FF )L + E(ɛF )L + LE(Fɛ ) + E(ɛɛ ) = LL + Ψ, since E(FF ) = V ar(f) = I and E(ɛF ) = Cov(ɛ, F) = 0. Also, (X µ)f = LFF + ɛf, Cov(X, F) = E[(X µ)f ] = LE(FF ) + E(ɛF ) = L. 659

7 Orthogonal Factor Model Under the orthogonal factor model, therefore: Var(X i ) = σ ii = l 2 i1 + l2 i2 + + l2 im + ψ i Cov(X i, X k ) = σ ik = l i1 l k1 + l i2 l k2 + + l im l km. The portion of the variance σ ii that is contributed by the m common factors is the communality and the portion that is not explained by the common factors is called the uniqueness (or the specific variance): σ ii = l 2 i1 + l2 i2 + + l2 im + ψ i = h 2 i + ψ i. 660

8 Orthogonal Factor Model The model assumes that the p(p + 1)/2 variances and covariances of X can be reproduced from the pm + p factor loadings and the variances of the p unique factors. Situations in which m is small relative to p is when factor analysis works best. If, for example, p = 12 and m = 2, the 78 elements of Σ can be reproduced from = 36 parameters in the factor model. Not all covariance matrices can be factored as LL + Ψ. 661

9 Example: No Proper Solution Let p = 3 and m = 1 and suppose that the covariance of X 1, X 2, X 3 is Σ = The orthogonal factor model requires that Σ = LL + Ψ. Under the one factor model assumption, we get: 1 = l ψ = l 11 l = l 11 l 31 1 = l ψ = l 21 l 31 1 = l ψ 3 662

10 Example: No Proper Solution From the equations above, we see that l 21 = (0.4/0.7)l 11. Since 0.90 = l 11 l 21, substituting for l 21 implies that l 2 11 = or l 11 = ± Here is where the problem starts. Since (by assumption) Var(F 1 ) = 1 and also Var(X 1 ) = 1, and since therefore Cov(X 1, F 1 ) = Corr(X 1, F 1 ) = l 11, we notice that we have a solution inconsistent with the assumptions of the model because a correlation cannot be larger than 1 or smaller than

11 Example: No Proper Solution Further, 1 = l ψ 1 ψ 1 = 1 l 2 11 = 0.575, which cannot be true because ψ = Var(ɛ 1 ). Thus, for m = 1 we get a numerical solution that is not consistent with the model or with the interpretation of its parameters. 664

12 Rotation of Factor Loadings When m > 1, there is no unique set of loadings and thus there is ambiguity associated with the factor model. Consider any m m orthogonal matrix T such that T T = T T = I. We can rewrite our model: X µ = LF + ɛ = LT T F + ɛ = L F + ɛ, with L = LT and F = T F. 665

13 Rotation of Factor Loadings Since E(F ) = T E(F) = 0, and Var(F ) = T Var(F)T = T T = I, it is impossible to distinguish between loadings L and L from just a set of data even though in general they will be different. Notice that the two sets of loadings generate the same covariance matrix Σ: Σ = LL + Ψ = LT T L + Ψ = L L + Ψ. 666

14 Rotation of Factor Loadings How to resolve this ambiguity? Typically, we first obtain the matrix of loadings (recognizing that it is not unique) and then rotate it by mutliplying by an orthogonal matrix. We choose the orthogonal matrix using some desired criterion. For example, a varimax rotation of the factor loadings results in a set of loadings with maximum variability. Other criteria for arriving at a unique set of loadings have also been proposed. We consider rotations in more detail in a little while. 667

15 Estimation in Orthogonal Factor Models We begin with a sample of size n of p dimensional vectors x 1, x 2,..., x n and (based on our knowledge of the problem) choose a small number m of factors. For the chosen m, we want to estimate the factor loading matrix L and the specific variances in the model Σ = LL + Ψ We use S as an estimator of Σ and first investigate whether the correlations among the p variables are large enough to justify the analysis. If r ij 0 for all i, j, the unique factors ψ i will dominate and we will not be able to identify common underlying factors. 668

16 Estimation in Orthogonal Factor Models Common estimation methods are: The principal component method The iterative principal factor method Maximum likelihood estimation (assumes normaility) The last two methods focus on using variation in common factors to describe correlations among measured traits. Principal component analysis gives more attention to variances. Estimated factor loadings from any of those methods can be rotated, as explained later, to facilitate interpretation of results. 669

17 The Principal Component Method Let (λ i, e i ) denote the eigenvalues and eigenvectors of Σ and recall the spectral decomposition which establishes that Σ = λ 1 e 1 e 1 + λ 2e 2 e λ pe p e p. Use L to denote the p p matrix with columns equal to λ i e i, i = 1,..., p. Then the spectral decomposition of Σ is given by Σ = LL + 0 = LL, and corresponds to a factor model in which there are as many factors as variables m = p and where the specific variances ψ i = 0. The loadings on the jth factor are just the coefficients of the jth principal component multiplied by λ j. 670

18 The Principal Component Method The principal component solution just described is not interesting because we have as many factors as we have variables. We really want m < p so that we can explain the covariance structure in the measurements using just a small number of underlying factors. If the last p m eigenvalues are small, we can ignore the last p m terms in the spectral decomposition and write Σ L p m L m p. 671

19 The Principal Component Method The communality for the i-th observed variable is the amount of its variance that can be atttributed to the variation in the m factors h i = m l 2 ij j=1 for i = 1, 2,..., p The variances ψ i of the specific factors can then be taken to be the diagonal elements of the difference matrix Σ LL, where L is p m. That is, ψ i = σ ii m l 2 ij j=1 or Ψ = diag(σ L p m L m p ) for i = 1,..., p 672

20 The Principal Component Method Note that using m < p factors will produce an approximation L p m L m p + Ψ for Σ that exactly reproduces the variances of the p measured traits but only approximates the correlations. If variables are measured in very different scales, we work with the standardized variables as we did when extracting PCs. This is equivalent to modeling the correlation matrix P rather than the covariance matrix Σ. 673

21 Principal Component Estimation To implement the PC method, we must use S to estimate Σ (or use R if the observations are standardized) and use X to estimate µ. The principal component estimate of the loading matrix for the m factor model is L = [ ˆλ 1 ê 1 ˆλ 2 ê 2... ˆλ m ê m ], where (ˆλ i, ê i ) are the eigenvalues and eigenvectors of S (or of R if the observations are standardized). 674

22 Principal Component Estimation The estimated specific variances are given by the diagonal elements of S L L, so Ψ = ψ ψ ψ p, ψ i = s ii m j=1 l 2 ij. Communalities are estimated as h 2 i = l 2 i1 + l 2 i2 + + l 2 im. If variables are standardized, then we substitute R for S and substitute 1 for each s ii. 675

23 Principal Component Estimation In many applications of factor analysis, m, the number of factors, is decided prior to the analysis. If we do not know m, we can try to determine the best m by looking at the results from fitting the model with different values for m. Examine how well the off-diagonal elements of S (or R) are reproduced by the fitted model L L + Ψ because by definition of ψ i, diagonal elements of S are reproduced exactly but the off-diagonal elements are not. The chosen m is appropriate if the residual matrix S ( L L + Ψ) has small off-diagonal elements. 676

24 Principal Component Estimation Another approach to deciding on m is to examine the contribution of each potential factor to the total variance. The contribution of the k-th factor to the sample variance for the i-th trait, s ii, is estimated as l 2 ik. The contribution of the k-th factor to the total sample variance s 11 + s s pp is estimated as l 2 1k + l 2 2k l 2 pk = ( ˆλ k ê k ) ( ˆλ k ê k ) = ˆλ k. 677

25 Principal Component Estimation As in the case of PCs, in general ( Proportion of total sample variance due to jth factor or equals ˆλ j /p if factors are extracted from R. ) = ˆλ j s 11 + s s pp, Thus, a reasonable number of factors is indicated by the minimum number of PCs that explain a suitably large proportion of the total variance. 678

26 Strategy for PC Factor Analysis First, center observations (and perhaps standardize) If m is determined by subject matter knowledge, fit the m factor model by: Extracting the p PCs from S or from R. Constructing the p m matrix of loadings L by keeping the PCs associated to the largest m eigenvalues of S (or R). If m is not known a priori, examine the estimated eignevalues to determine the number of factors that account for a suitably large proportion of the total variance and examine the off-diagonal elements of the residual matrix. 679

27 Example: Stock Price Data Data are weekly gains in stock prices for 100 consecutive weeks for five companies: Allied Chemical, Du Pont, Union Carbide, Exxon and Texaco. Note that the first three are chemical companies and the last two are oil companies. The data are first standardized and the sample correlation matrix R is computed. Fit an m = 2 orthogonal factor model 680

28 Example: Stock Price Data The sample correlation matrix R is R = Eigenvalues and eigenvectors of R are ˆλ = , ê 1 = , ê 2 =

29 Example: Stock Price Data Recall that the method of principal components results in factor loadings equal to ˆλê j, so in this case Loadings Loadings Specific on factor 1 on factor 2 variances Variable l i1 l i2 ψ i = 1 h 2 i Allied Chem Du Pont Union Carb Exxon Texaco The proportions of total variance accounted by the first and second factors are λ 1 /p = and λ 2 /p =

30 Example: Stock Price Data The first factor appears to be a market-wide effect on weekly stock price gains whereas the second factor reflects industry specific effects on chemical and oil stock price returns. The residual matrix is given by R ( L L + Ψ) = By construction, the diagonal elements of the residual matrix are zero. Are the off-diagonal elements small enough? 683

31 Example: Stock Price Data In this example, most of the residuals appear to be small, with the exception of the {4, 5} element and perhaps also the {1, 2}, {1, 3}, {2, 3} elements. Since the {4, 5} element of the residual matrix is negative, we know that L L is producing a correlation value between Exxon and Texaco that is larger than the observed value. When the off-diagonals in the residual matrix are not small, we might consider changing the number m of factors to see whether we can reproduce the correlations between the variables better. In this example, we would probably not change the model. 684

32 SAS code: Stock Price Data /* This program is posted as stocks.sas. It performs factor analyses for the weekly stock return data from Johnson & Wichern. The data are posted as stocks.dat */ data set1; infile "c:\stat501\data\stocks.dat"; input x1-x5; label x1 = ALLIED CHEM x2 = DUPONT x3 = UNION CARBIDE x4 = EXXON x5 = TEXACO; run; 685

33 /* Compute principal components */ proc factor data=set1 method=prin scree nfactors=2 simple corr ev res msa nplot=2 out=scorepc outstat=facpc; var x1-x5; run; Factor Pattern Factor1 Factor2 x1 ALLIED CHEM x2 DUPONT x3 UNION CARBIDE x4 EXXON x5 TEXACO

34 R code: Stock Price Data # This code creates scatter plot matrices and does # factor analysis for 100 consecutive weeks of gains # in prices for five stocks. This code is posted as # # stocks.r # # The data are posted as stocks.dat # # There is one line of data for each week and the # weekly gains are represented as # x1 = ALLIED CHEMICAL # x2 = DUPONT # x3 = UNION CARBIDE # x4 = EXXON # x5 = TEXACO 687

35 stocks <- read.table("c:/stat501/data/stocks.dat", header=f, col.names=c("x1", "x2", "x3", "x4", "x5")) # Print the first six rows of the data file stocks[1:6, ] x1 x2 x3 x4 x

36 # Create a scatter plot matrix of the standardized data stockss <- scale(stocks, center=t, scale=t) pairs(stockss,labels=c("allied","dupont","carbide", "Exxon", "Texaco"), panel=function(x,y){panel.smooth(x,y) abline(lsfit(x,y),lty=2) }) 689

37 Allied Dupont Carbide Exxon Texaco 690

38 # Compute principal components from the correlation matrix s.cor <- var(stockss) s.cor x1 x2 x3 x4 x5 x x x x x s.pc <- prcomp(stocks, scale=t, center=t) # List component coefficients 691

39 s.pc$rotation PC1 PC2 PC3 PC4 PC5 x x x x x

40 # Estimate proportion of variation explained by each # principal component,and cummulative proportions s <- var(s.pc$x) pvar<-round(diag(s)/sum(diag(s)), digits=6) pvar PC1 PC2 PC3 PC4 PC cpvar <- round(cumsum(diag(s))/sum(diag(s)), digits=6) cpvar PC1 PC2 PC3 PC4 PC

41 # Plot component scores par(fin=c(5,5)) plot(s.pc$x[,1],s.pc$x[,2], xlab="pc1: Overall Market", ylab="pc2: Oil vs. Chemical",type="p") 693

42 PC1: Overall Market PC2: Oil vs. Chemical

43 Principal Factor Method The principal factor method for estimating factor loadings can be viewed as an iterative modification of the principal components methods that allows for greater focus on explaining correlations among observed traits. The principal factor approach begins with a guess about the communalities and then iteratively updates those estimates until some convergence criterion is satisfied. (Try several different sets of starting values.) The principal factor method chooses factor loadings that more closely reproduce correlations. Principal components are more heavily influenced by accounting for variances and will explain a higher proportion of the total variance. 694

44 Principal Factor Method Consider the estimated model for the correlation matrix. R L (L ) + Ψ The estimated loading matrix should provide a good approximation for all of the correlations and part of the variances as follows L (L ) R Ψ (h 1 )2 r 12 r 13 r 1p r 21 (h 2 )2 r 23 r 2p r p1 r p2 r p3 (h p) 2 where (h i )2 = 1 Ψ i is the estimated communality for the i-th factor Find a factor loading matrix L so that L (L ) is a good approximation for R Ψ, rather than trying to make L (L ) a good approximation for R. 695

45 Principal Factor Method Start by obtaining initial estimates of the communalities, (h 1 )2, (h 2 )2,..., (h p) 2. Then the estimated loading matrix should provide a good approximation to R Ψ (h 1 )2 r 12 r 13 r 1p r 21 (h 2 )2 r 23 r 2p r p1 r p2 r p3 (h p) 2 where (h i )2 = 1 Ψ i is the estimated communality for the i-th factor Use the spectral decomposition of R Ψ to find a good approximation to R Ψ. 696

46 Principal Factor Method Use the initial estimates to compute R Ψ (h 1 )2 r 12 r 13 r 1p r 21 (h 2 )2 r 23 r 2p r p1 r p2 r p3 (h p) 2 The estimated loading matrix is obtained from the eignevalues and eigenvectors of this matrix as L = [ λ ] 1 e 1 λ me m Update the specific variances Ψ m i = 1 λ j [e ij ]2 j=1 697

47 Principal Factor Method Use the updated communalities to re-evaluate R Ψ (h 1 )2 r 12 r 13 r 1p r 21 (h 2 )2 r 23 r 2p r p1 r p2 r p3 (h p) 2 and compute a new estimate of the loading matirx, L = [ λ ] 1 e 1 λ me m Repeat this process until it converges. 698

48 Principal Factor Method Note that R Ψ is generally not positive definite and some eignvalues can be negative. The results are sensitive to the choice of the number of factors m If m is too large, some communalities can be larger than one, which would imply that variation in factor values accounts for more than 100 percent of the variation in the measured traits. SAS has two options to deal with this: HEYWOOD: Set any estimated communality larger than one equal to one and continue iterations with the remaining variables. ULTRAHEYWOOD: Continue iterations with all of the variables and hope that iterations eventually bring you back into the allowable parameter space. 699

49 Stock Price Example The estimated loadings from the principal factor method are displayed below. Loadings Loadings Specific Commuon factor 1 on factor 2 variances nalities Variable l i1 l i2 ψi = 1 (h i )2 (h i )2 Allied Chem Du Pont Union Carb Exxon Texaco The proportions of total variance accounted by the first and second factors are λ 1 /p = 0.45 and λ 2 /p =

50 Stock Price Example Interpretations are similar. The first factor appears to be a market-wide effect on weekly stock price gains whereas the second factor reflects industry specific effects on chemical and oil stock price returns. The loadings tend to be smaller than those obtained from principal component analysis. Why? 701

51 Stock Price Example The residual matrix is R [(L )(L ) +Ψ ] = Since the principal factor methods give greater emphasis to describing correlations, the off-diagonal elements of the residual matrix tend to be smaller than corresponding elements for principal component estimation. 702

52 /* SAS code for the principal factor method */ proc factor data=set1 method=prinit scree nfactors=2 simple corr ev res msa nplot=2 out=scorepf outstat=facpf; var x1-x5; priors smc; run; To obtain the initial communaliy for the i-th variable, smc in the priors statement uses the R 2 value for the regression of the i-th variable on the other p 1 variables in the analysis. 703

53 Maximum Likelihood Estimation To implement the ML method, we need to include some assumptions about the distribution of the p-dimensional vector X j and the m-dimensional vector F j : X j N p (µ, Σ), F j N m (0, I m ), ɛ j N p (0, Ψ p p ), where X j = LF j + ɛ j, Σ = LL + Ψ, and F j is independent of ɛ j. Also, Ψ is a diagonal matrix. Because the L that maximizes the likelihood for this model is not unique, we need another set of restrictions that lead to a unique maximum of the likelihood function: where is a diagonal matrix. L Ψ 1 L =, 704

54 Maximum Likelihood Estimation Maximizing the likelihood function with respect to (µ, L, Ψ) is not easy because the likelihood depends in a very non-linear fashion on the parameters. Recall that (L, Ψ) enter the likelihood through Σ which in turn enters the likelihood as an inverse matrix in a quadratic form and also as a determinant. Efficient numerical algorithms exist to maximize the likelihood iteratively and obtain ML estimates ˆL m m = {ˆl ij }, ˆΨ = diag{ ˆψ i }, ˆµ = x. 705

55 Maximum Likelihood Estimation The MLEs of the communalities for the p variables (with m factors)are ĥ 2 i = m j=1 ˆl 2 ij, i = 1,..., p, and the proportion of the total variance accounted for by the jth factor is given by p i=1 ˆl 2 ij trace(s) if using S or dividing by p if using R. 706

56 Maximum Likelihood Estimation: Stock Prices Results; Loadings Loadings Specific Variable for factor 1 for factor 2 variances Allied Chem Du Pont Union Carb Exxon Texaco Prop. of variance

57 Maximum Likelihood Estimation: Stock Prices The interpretation of the two factors remains more or less the same (although Exxon is hardly loading on the second factor now). What has improved considerably is the residual matrix, since most of the entries are essentially zero now: R (ˆLˆL + ˆΨ) =

58 Likelihood Ratio test for Number of Factors We wish to test whether the m factor model appropriately describes the covariances among the p variables. We test versus H 0 : Σ p p = L p m L m p + Ψ p p H a : Σ is a positive definite matrix A likelihood ratio test for H 0 can be constructed as the ratio of the maximum of likelihood function under H 0 and the maximum of the likelihood function under no restrictions (i.e., under H a ). 709

59 Likelihood Ratio test for Number of Factors Under H a the multivariate normal likelihood is maximized at ˆµ = x, and ˆΣ = 1 n ni=1 (x i x)(x i x). Under H 0, the multivariate normal likelihood is maximized at ˆµ = x and ˆLˆL + ˆΨ, the mle for Σ for the orthogonal factor model with m factors. 710

60 Likelihood Ratio test for Number of Factors It is easy to show that the result is proportional to ( ˆLˆL + ˆΨ n/2 exp 1 ) 2 ntr[(ˆlˆl + ˆΨ) 1 ˆΣ], Then, the log-likelihood ratio statistic for testing H 0 is 2 ln Λ = 2 ln ( ˆLˆL ) n/2 + ˆΨ + n[tr((ˆlˆl + ˆΨ) 1 ˆΣ) p]. ˆΣ Since the second term in the expression above can be shown to be zero, the test statistic simplifies to ( ˆLˆL ) + ˆΨ 2 ln Λ = n ln. ˆΣ 711

61 Likelihood Ratio test for Number of Factors The degrees of freedom for the large sample chi-square approximation to this test statistic are Where does this come from? (1/2)[(p m) 2 p m] Under H a we estimate p means and p(p+1)/2 elements of Σ. Under H 0, we estimate p means p diagonal elements of Ψ and mp elements of L. An additional set of m(m 1)/2 restrictions are imposed on the mp elements of L by the identifiability restriction L Ψ 1 L = 712

62 Likelihood Ratio test for Number of Factors Putting it all together, we have p(p + 1) m(m 1) df = [p+ ] [p+pm+p ] = [(p m)2 p m] degrees of freedom. Bartlett showed that the χ 2 approximation to the sampling distribution of 2 ln Λ can be improved if the test statistic is computed as 2 ln Λ = (n 1 (2p + 4m + 5)/6) ln ˆLˆL + ˆΨ. ˆΣ 713

63 Likelihood Ratio test for Number of Factors We reject H 0 at level α if 2 ln Λ = (n 1 (2p + 4m + 5)/6) ln ˆLˆL + ˆΨ ˆΣ > χ 2 df, with df = 1 2 [(p m)2 p m] for large n and large n p. To have df > 0, we must have m < 1 2 (2p + 1 8p + 1). If the data are not a random sample from a multivariate normal distribution, this test tends to indicate the need for too many factors. 714

64 Stock Price Data We carry out the log-likelihood ratio test at level α = 0.05 to see whether the two-factor model is appropriate for these data. We have estimated the factor loadings and the unique factors using R rather than S, but the exact same test statistic (with R in place of ˆΣ) can be used for the test. Since we evaluated ˆL, ˆΨ and R, we can easily compute ˆLˆL + ˆΨ R = =

65 Stock Price Data Then, the test statistic is [100 1 ( ) ] ln(1.0065) = Since 0.58 < χ 2 1 = 3.84, we fail to reject H 0 and conclude that the two-factor model adequately describes the correlations among the five stock prices (p-value=0.4482). Another way to say it is that the data do not contradict the two-factor model. 716

66 Stock Price Data Test the null hypothesis that one factor is sufficient [100 1 ( ) with 5 degrees of freedom. 6 ] ln ˆLˆL + ˆΨ ˆΣ = Since p value = , we reject the null hypothesis and conclude that a one factor model is not sufficient to describe the correlations among the weekly returns for the five stocks. 717

67 Maximum Likelihood Estimation: Stock Prices We implemented the ML method for the stock price example. The same SAS program can be used, but now use method = ml instead of prinit in the proc factor statement. proc factor data=set1 method=ml scree nfactors=2 simple corr ev res msa nplot=2 out=scorepf outstat=facpf; var x1-x5; priors smc; run; 718

68 Initial Factor Method: Maximum Likelihood Prior Communality Estimates: SMC x1 x2 x3 x4 x Preliminary Eigenvalues: Total = Average = Eigenvalue Difference Proportion Cumulative factors will be retained by the NFACTOR criterion. 719

69 Iteration Criterion Ridge Change Communalities Convergence criterion satisfied. 720

70 Significance Tests Based on 100 Observations Test DF Chi-Square Pr>ChiSq H0: No common factors <.0001 HA: At least one common factor H0: 2 Factors are sufficient HA: More factors are needed Chi-Square without Bartlett s Correction Akaike s Information Criterion Schwarz s Bayesian Criterion Tucker and Lewis s Reliability Coefficient Squared Canonical Correlations Factor1 Factor

71 Eigenvalues of the Weighted Reduced Correlation Matrix: Total = Average = Eigenvalue Difference Proportion Cumulative Factor Pattern Factor1 Factor2 x5 TEXACO x2 DUPONT x1 ALLIED CHEM x3 UNION CARBIDE x4 EXXON

72 Variance Explained by Each Factor Factor Weighted Unweighted Factor Factor Final Communality Estimates and Variable Weights Total Communality: Weighted = Unweighted = Variable Communality Weight x x x x x

73 Residual Correlations With Uniqueness on the Diagonal x1 x2 x3 x1 ALLIED CHEM x2 DUPONT x3 UNION CARBIDE x4 EXXON x5 TEXACO x4 x5 x1 ALLIED CHEM x2 DUPONT x3 UNION CARBIDE x4 EXXON x5 TEXACO Root Mean Square Off-Diagonal Residuals: Overall = x1 x2 x3 x4 x

74 Maximum Likelihood Estimation with R stocks.fac <- factanal(stocks, factors=2, method= mle, scale=t, center=t) stocks.fac Call: factanal(x = stocks, factors = 2, method = "mle", scale=t, center=t) Uniquenesses: x1 x2 x3 x4 x

75 Loadings: Factor1 Factor2 x x x x x Factor1 Factor2 SS loadings Proportion Var Cumulative Var Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 0.58 on 1 degree of freedom. The p-value is

76 Measure of Sampling Adequacy In constructing a survey or other type of instrument to examine factors like political attitudes, social constructs, mental abilities, consumer confidence, that cannot be measured directly, it is common to include a set of questions or items that should vary together as the values of the factor vary across subjects. In a job attitude assessment, for example, you might include questions of the following type in the survey: 1. How well do you like your job? 2. How eager are you to go to work in the morning? 3. How professionally satisfied are you? 4. What is your level of frustration with assignment to meaningless tasks? 727

77 Measure of Sampling Adequacy If the responses to these and other questions had strong positive or negative correlations, you may be able to identify a job satisfaction factor. In marketing, education, and behavioral research, the existence of highly correlated resposnes to a battery of questions is used to defend the notion that an important factor exists and to justify the use of the survey form or measurement instrument. Measures of sampling adequacy are based on correlations and partial correlations among responses to different questions (or items) 728

78 Measure of Sampling Adequacy Kaiser (1970, Psychometrica) proposed a statistic called Measure of Sampling Adequacy (MSA). MSA measures the relative sizes of the pairwise correlations to the partial correlations between all pairs of variables as follows: MSA = 1 j k j qjk 2 j k j rjk 2 where r jk is the marginal sample correlation between variables j and k and q jk is the partial correlation between the two variables after accounting for all other variables in the set. 729

79 Measure of Sampling Adequacy If the r s are relatively large and the q s are relatively small, MSA tends to 1. This indicates that more than two variables are changing together as the values of the factor vary across subjects. Side note: If p = 3 for example, q 12 is given by q 12 = r 12 r 13 r 23 (1 r 2 13 )(1 r2 23 ). 730

80 Measure of Sampling Adequacy The MSA can take on values between 0 and 1 and Kaiser proposed the following guidelines: MSA range Interpretation MSA range Interpretation 0.9 to 1 Marvelous data 0.6 to 0.7 Mediocre data 0.8 to 0.9 Meritorious data 0.5 to 0.6 Miserable 0.7 to 0.8 Middling 0.0 to 0.5 Unacceptable SAS computes MSAs for each of the p variables. Kaiser s Measure of Sampling Adequacy: Overall MSA = x1 x2 x3 x4 x

81 Tucker and Lewis Reliability Coefficient Tucker and Lewis (1973) Psychometrica, 38, pp From the mle s for m factors compute ˆL m estimates of factor loadings ˆΨ m = diag(r ˆL mˆl m) G m = ˆΨ 1/2 m diag(r ˆL mˆl m) ˆΨ 1/2 m 732

82 Tucker and Lewis Reliability Coefficient g m(ij) is called the partial correlation between variables X i and X j controlling for the m common factors Sum of the squared partial correlations controlling for the m common factors is Mean square is F m = i<j g 2 m(ij) M m = F m df m where df m = 0.5[(p m) 2 p m] are the degrees of freedom for the likelihood ratio test of the null hypothesis that m factors are sufficient 733

83 Tucker and Lewis Reliability Coefficient The mean square for the model with zero common factors is i<j rij 2 M 0 = p(p 1)/2 The reliability coefficient is ˆρ m = M 0 M m M 0 n 1 m where n m = (n 1) 2p+5 6 2m 3 is a Bartlett correction factor. For the weekly stock gains data SAS reports Tucker and Lewis s Reliability Coefficient

84 Cornbach s Alpha A set of observed variables X = (X 1, X 2,..., X p ) that all measure the same latent trait should all have high positive pairwise correlations Define r = 1 p(p 1)/2 i<j ˆ Cov(X i, X j ) 1 p p i=1 ˆ V ar(x i ) = 2 i<j S ij p p 1 i=1 S ii Cornbach s Alpha is α = p r 1 + (p 1) r = p p 1 [ 1 i V ar(x i ) V ar( i X i ) ] 735

85 Cornbach s Alpha For standardized variables, p i=1 V ar(x i) = p If ρ ij = 1 for all i j, then V ar( i X i ) = i V ar(x i ) + 2 i<j V ar(x i )V ar(x j ) = i V ar(x i ) 2 = p 2 736

86 Cornbach s Alpha In the extreme case where all pairwise correlations are 1 we have i V ar(x i ) V ar( i X i ) = p p 2 = 1 p and α = p p 1 [ 1 i V ar(x i ) V ar( i X i ) ] = p p 1 [ 1 p p 2 ] = 1 When r = 0, then α = 0 Be sure that the scores for all items are orientated in the same direction so all correlations are positive 737

87 Factor Rotation As mentioned earlier, multiplying a matrix of factor loadings by any orthogonal matrix leads to the same approximation to the covariance (or correlation) matrix. This means that mathematically, it does not matter whether we estimate the loadings as ˆL or as ˆL = ˆLT where T T = T T = I. The estimated residual matrix matrix also remains unchanged: S n ˆLˆL ˆΨ = S n ˆLT T ˆL ˆΨ = S n ˆL ˆL ˆΨ The specific variances ˆψ i and therefore the communalities also remain unchanged. 738

88 Factor Rotation Since only the loadings change by rotation, we rotate factors to see if we can better interpret results. There are many choices for a rotation matrix T, and to choose, we first establish a mathematical criterion and then see which T can best satisfy the criterion. One possible objective is to have each one of the p variables load highly on only one factor and have moderate to negligible loads on all other factors. It is not always possible to achieve this type of result. This can be examined with a varimax rotation 739

89 Varimax Rotation Define l ij = ˆl ij /ĥ i as the scaled loading of the i-th variable on the j-th rotated factor. Compute the variance of the squares of the scaled loadings for the j-th rotated factor 1 p p i=1 ( l ij )2 1 p 2 p ( l kj )2 k=1 = 1 p p i=1( l ij )4 1 p p 2 ( l kj )2 k=1 The varimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those variances, summing across all m rotated factors. After rotation each of the p variables should load highly on at most one of the rotated factors 740

90 Maximum Likelihood Estimation: Stock Prices Varimax rotation of m = 2 factors MLE MLE Rotated Rotated Variable factor 1 factor 2 factor 1 factor 2 Allied Chem Du Pont Union Carb Exxon Texaco Prop. of var

91 Quartimax Rotation The varimax rotation will destroy an overall factor The quartimax rotation try to 1. Preserve an overall factor such that each of the p variables has a high loading on that factor 2. Create other factors such that each of the p variables has a high loading on at most one factor 742

92 Quartimax Rotation Define l ij = ˆl ij /ĥ i as the scaled loading of the i-th variable on the j-th rotated factor. Compute the variance of the squares of the scaled loadings for the i-th variable 1 m m j=1 ( l ij )2 1 m m k=1 2 2 ( l ik ) = 1 m m j=1( l ij )4 1 m 2 m ( l ik )2 k=1 The quartimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those variances, summing across all p variables. 743

93 Maximum Likelihood Estimation: Stock Prices Quartimax rotation of m = 2 factors MLE MLE Rotated Rotated Variable factor 1 factor 2 factor 1 factor 2 Allied Chem Du Pont Union Carb Exxon Texaco Prop. of var

94 PROMAX Transformation The varimax and quartimax rotations produce uncorrelated factors PROMAX is a non-orthogonal (oblique) transformation that 1. is not a rotation 2. can produce correlated factors 745

95 PROMAX Transformation 1. First perform a varimax rotate to obtain loadings L 2. Construct another p m matrix Q such that q ij = l ij k 1 l ij for l ij 0 = 0 for l ij = 0 where k > 1 is selected by trial and error, ususally k < Find a matrix U such that each column of L U is close to the corresponding column of Q. Choose the j-th column of U to minimize This yields (q j L u j ) (q j L u j ) U = [(L ) (L )] 1 (L ) Q 746

96 PROMAX Transformation 1. Rescale U so that the transformed factors have unit variance. Compute D 2 = diag[(u U) 1 ] and M = UD. 2. The PROMAX factors are obtained from L F = L MM 1 F = L P F P The PROMAX transformation yields factors with loadings L P = L M 747

97 PROMAX Transformation The correlation martix for the transformed factors is (M M) 1 Derivation: V ar(f P ) = V ar(m 1 F) = M 1 V ar(f)(m 1 ) = M 1 I(M 1 ) = (M M) 1 748

98 Maximum Likelihood Estimation: Stock Prices PROMAX transformation of m = 2 factors MLE MLE PROMAX PROMAX Variable factor 1 factor 2 factor 1 factor 2 Allied Chem Du Pont Union Carb Exxon Texaco Estimated correlation between the factors is

99 /* Varimax Rotation */ proc factor data=set1 method=ml nfactors=2 res msa mineigen=0 maxiter=75 conv=.0001 outstat=facml2 rotate=varimax reorder; var x1-x5; priors smc; run; proc factor data=set1 method=prin nfactors=3 mineigen=0 res msa maxiter=75 conv=.0001 outstat=facml3 rotate=varimax reorder; var x1-x5; priors smc; run; 750

100 The FACTOR Procedure Rotation Method: Varimax Orthogonal Transformation Matrix Rotated Factor Pattern Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON

102 /* Apply other rotations to the two factor solution */ proc factor data=facml2 nfactors=2 conv=.0001 rotate=quartimax reorder; var x1-x5; priors smc; run; 753

103 The FACTOR Procedure Rotation Method: Quartimax Orthogonal Transformation Matrix Rotated Factor Pattern Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x4 EXXON x5 TEXACO

105 proc factor data=facml2 nfactors=2 conv=.0001 rotate=promax reorder score; var x1-x5; priors smc; run; 756

106 The FACTOR Procedure Prerotation Method: Varimax Orthogonal Transformation Matrix Rotated Factor Pattern Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON

108 The FACTOR Procedure Rotation Method: Promax (power = 3) Target Matrix for Procrustean Transformation Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON Procrustean Transformation Matrix

109 Normalized Oblique Transformation Matrix Inter-Factor Correlations Factor1 Factor2 Factor Factor

110 Rotated Factor Pattern (Standardized Regression Coefficients) Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON

111 Reference Structure (Semipartial Correlations) Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON Variance Explained by Each Factor Eliminating Other Factors Factor Weighted Unweighted Factor Factor

112 Factor Structure (Correlations) Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON Variance Explained by Each Factor Ignoring Other Factors Factor Weighted Unweighted Factor Factor

113 Final Communality Estimates and Variable Weights Total Communality: Weighted = Unweighted = Variable Communality Weight x x x x x

114 Scoring Coefficients Estimated by Regression Squared Multiple Correlations of the Variables with Each Factor Factor1 Factor Standardized Scoring Coefficients Factor1 Factor2 x2 DUPONT x3 UNION CARBIDE x1 ALLIED CHEM x5 TEXACO x4 EXXON

115 R code posted in stocks.r # Compute maximun likelihood estimates for factors stocks.fac <- factanal(stocks, factors=2, method= mle, scale=t, center=t) stocks.fac Call: factanal(x = stocks, factors = 2, method = "mle", scale = T, center = T) Uniquenesses: x1 x2 x3 x4 x

117 # Apply the Varimax rotation (this was the default) stocks.fac <- factanal(stocks, factors=2, rotation="varimax", method="mle", scores="regression") stocks.fac Call: factanal(x = stocks, factors = 2, scores = "regression", rotation = "varimax", method = "mle") Uniquenesses: x1 x2 x3 x4 x

119 # Compute the residual matrix pred<-stocks.fac$loadings%*%t(stocks.fac$loadings) +diag(stocks.fac$uniqueness) resid <- s.cor - pred resid x1 x2 x3 x4 x e e e e e-03 x e e e e e-04 x e e e e e-03 x e e e e e-04 x e e e e e

120 # List factor scores stocks.fac$scores Factor1 Factor

121 # You could use the following code in Splus to apply # the quartimax rotation, but this is not available in R # # stocks.fac <- factanal(stocks, factors=2, rotation="quartimax", # method="mle", scores="regression") # stocks.fac # Apply the Promax rotation promax(stocks.fac$loadings, m=3) 772

122 Loadings: Factor1 Factor2 x x x x x Factor1 Factor2 SS loadings Proportion Var Cumulative Var $rotmat [,1] [,2] [1,] [2,]

123 # You could try to apply the promax rotation with the # factanal function. It selects the power as m=4 > > stocks.fac <- factanal(stocks, factors=2, rotation="promax", + method="mle", scores="regression") > > stocks.fac Call: factanal(x = stocks, factors = 2, scores = "regression", rotation = " Uniquenesses: x1 x2 x3 x4 x

125 Example: Test Scores The sample correlation between p = 6 test scores collected from n = 220 students is given below: R = An m = 2 factor model was fitted to this correlation matrix using ML. 776

126 Example: Test Scores Estimated factor loadings and communalities are Loadings on Loadings on Communalities Variable factor 1 factor 2 ĥ 2 i Gaelic English History Arithmetic Algebra Geometry

127 Example: Test Scores All variables load highly on the first factor. general intelligence factor. We call that a Half of the loadings are positive and half are negative on the second factor. The positive loadings correspond to the verbal scores and the negative correspond to the math scores. Correlations between scores on verbal tests and math tests tend to be lower than correlations among scores on math tests or correlations among scores on verbal tests, so this is a math versus verbal factor. We plot the six loadings for each factor (ˆl i1, ˆl i2 ) on the original coordinate system and also on a rotated set of coordinates chosen so that one axis goes through the loadings (ˆl 41, ˆl 42 ) of the fourth variable on the two factors. 778

128 Factor Rotation for Test Scores 779

129 Varimax Rotation for Test Scores Loadings for rotated factors using the varimax criterion are as follows: Loadings on Loadings on Communalities Variable F1 F2 ĥ 2 i Gaelic English History Arithmetic Algebra Geometry F 1 is primarily a mathematics ability factor and F 2 is a verbal ability factor. 780

130 PROMAX Rotation for Test Scores F 1 is more clearly a mathematics ability factor and F 2 is a verbal ability factor. Loadings on Loadings on Communalities Variable F1 F2 ĥ 2 i Gaelic English History Arithmetic Algebra Geometry These factors have correlation Code is posted as Lawley.sas and Lawley.R 781

131 Factor Scores Sometimes, we require estimates of the factor values for each respondent in the sample. For j = 1,..., n, the estimated m dimensional vector of factor scores is ˆf j = estimate of values f j attained by F j. To estimate f j we act as if ˆL (or L) and ˆΨ (or Ψ) are true values. Recall the model (X j µ) = LF j + ɛ j. 782