Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model. Idea: variance of each variable depends on common factors (shared among variables) plus specific factor for that variable. Aim: identify common factors, relate to original variables. Want small variance of specific factors ( error ). Extra: use rotation to get clearer answers. 67 68 Examples One aim of factor analysis: identify unobservable characteristics (eg. attitudes, beliefs, perceptions). Observe measurable variables, try to relate. Example: give children several tests of different kinds: reading comprehension, spelling, sentence completion addition and subtraction, counting Hope to find factors picking out first 3 tests ( verbal ability ), last 2 ( numerical ability ). Another: perceptions of luxury cars. Ask potential customers to rate many luxury cars on many features ( style, reliability, performance ), look for factors relating to features. Might get factor based on reliability, fuel economy, maintenance, quality, durability call it sensible. Then another based on luxury, style, performance call it eg. appealing. Aim: pick out common features of variables. 69 70
How it works Measure p variables, X 1,..., X p. Assume standardized: E(X i ) = 0, var(x i ) = 1 for all i. One common factor to start: Observed X i depends on common factor ξ ( xi ) plus specific factor δ i. Write X i = λ i ξ + δ i, like regression except ξ, δ i not observable. Assume ξ also standardized, ξ, δ i independent. Take variances: Interpret λ 2 i as proportion of variation in X i explained by common factor (like R 2 ). Called communality of X i. Rest of variance down to specific factor. Write θ 2 ii = var(δ i ); communality then 1 θ 2 ii. Communality near 1: X i near-perfect measure of ξ. Near 0: nothing to do with common factor. Want communalities not too small. var(x i ) = var(λ i ξ +δ i ) = λ 2 i var(ξ)+var(δ i ) = λ 2 i +var(δ i ). But this is 1. 71 72 Two common factors In principal components, cannot always use only one. Likewise, in factor analysis, may need 2 or more common factors. Assume ξ 1, ξ 2 independent, standardized. Write X i = λ i1 ξ 1 + λ i2 ξ 2 + δ i ; var(x i ) = λ 2 i1 + λ 2 i2 + θii 2 = 1. Specific variances θii 2 only appear with X i, so only affect variances (not covariances) of X i. Communality now 1 θ 2 ii = λ 2 i1 + λ 2 i2. Hope that one of the λ ij is reasonably large (eg. if factor 1 is verbal ability and factor 2 mathematical, that each test has something to do with one of these). Finding a solution: Principal Factor Analysis Specific factors only impact variances, not covariances. So if we knew the θ 2 ii = var δ i, would only impact diagonal of var-cov matrix. Also, since we standardized X i, var-cov matrix is same as correlation matrix R. So work with matrix like R, but with the θii 2 subtracted off the diagonal. Then can solve problem using same ideas as principal components: find eigenvalues, then use eigenvectors as factors. (Not only solution, but will work.) 73 74
Example: correlation matrix for psychological test data, page 132. Read into IML matrix R. Suppose the specific factor variances are all θ 2 ii = 0.4 (actually value used in text). Then following lines subtract them from matrix diagonal and get eigenvalues/vectors: Eigenvalues are: EVAL 2.1874699 1.0217177 0.0151911-0.0889-0.135479 theta={ 0.40, 0.40, 0.40, 0.40, 0.40 }; m=r-diag(theta); print m; call eigen(eval,evec,m); print eval; print evec; (as text) and eigenvectors (different from text) EVEC 0.5344977-0.244921-0.114361-0.098775 0.7946642 0.5424158-0.164139 0.0596937-0.660478-0.488927 0.5233988-0.247046 0.1439605 0.7379749-0.315738 0.2971094 0.6267765-0.707376 0.0957808-0.096555 0.2405763 0.6776369 0.6798921-0.015231 0.1429894 75 76 Now, 1 θ 2 ii = λ2 i1 + λ2 i2, so use current estimates of λ ij. These are the j-th eigenvalue times the eigenvector coefficient squared, Not all eigenvalues positive. Use those that look meaningfully so (first 2). Eigenvectors give factor loadings as in principal components. First factor mostly 1st 3 tests (para, sent, word); second factor last 2 tests (add, dots). (Verbal, numerical skills.) But... were the communalities correct? added up. Thus: 1 θ11 2 = 2.187(0.5345) 2 + 1.022( 0.2449) 2 = 0.686 1 θ22 2 = 2.187(0.5424) 2 + 1.022( 0.1641) 2 = 0.671 1 θ33 2 = 2.187(0.5234) 2 + 1.022( 0.2470) 2 = 0.661 1 θ44 2 = 2.187(0.2971) 2 + 1.022(0.6268) 2 = 0.595 1 θ55 2 = 2.187(0.2406) 2 + 1.022(0.6776) 2 = 0.596 Probably not, since they were just guesses. so go back and repeat the process with values 0.314, 0.329, 0.339, 0.405, 0.404. 77 78
Updated values of 1 θ 2 ii are 0.720, 0.694, 0.679, 0.592, 0.594. Go Eigenvalues and eigenvectors changed slightly: EVAL 2.2496119 1.0281502 0.0164244-0.024233-0.060954 EVEC 0.5426375-0.237376-0.259844-0.008339 0.7626329 0.5462135-0.148169 0.0678567-0.706554-0.419373 0.5257002-0.2348 0.2792392 0.6869246-0.344483 0.2819185 0.6335701-0.665154 0.156708-0.228307 0.2266201 0.6820239 0.638336-0.065489 0.2678157 back and repeat process with 1 minus these. Continue until no change in eigenvalues, eigenvectors, 1 θ 2 ii. Final answers (basically as text): EVAL 2.2823815 1.0287372 0.0249511-0.000785-0.024268 EVEC 0.552499-0.238245-0.539249 0.1568417 0.5679746 0.5463672-0.137636 0.2246969-0.778567-0.160886 0.5229386-0.223968 0.4089454 0.6033197-0.380976 0.2731226 0.6337509-0.51972 0.0334822-0.502527 0.2194698 0.6873819 0.4705058 0.0641756 0.5038302 79 80 Using SAS s PROC FACTOR Of course, only did above to show method. In practice, use canned procedure. SAS can use original data or correlation matrix. For latter, type matrix into file like this: para 1 0.722 0.714 0.203 0.095 sent 0.722 1 0.685 0.246 0.181 word 0.714 0.685 1 0.170 0.113 add 0.203 0.246 0.170 1 0.585 dots 0.095 0.181 0.113 0.585 1 data rmat(type=corr); _type_= corr ; infile "rex2.dat"; input _name_ $ para sent word add dots; Following does one step of cycle starting from 1 θii 2 = 0.6: proc factor; priors.6.6.6.6.6; and produces this output: with variable name on front of line, then read in file in special way with same variable names: 81 82
To do iterated principal factor analysis, change the code to contain Prior Communality Estimates: NUMERIC PARA SENT WORD ADD DOTS 0.600000 0.600000 0.600000 0.600000 0.600000 Eigenvalues of the Reduced Correlation Matrix: Total = 3 Average = 0.6 1 2 3 Eigenvalue 2.1875 1.0217 0.0152 Difference 1.1658 1.0065 0.1041 Proportion 0.7292 0.3406 0.0051 Cumulative 0.7292 1.0697 1.0748 4 5 Eigenvalue -0.0889-0.1355 Difference 0.0466 Proportion -0.0296-0.0452 Cumulative 1.0452 1.0000 which has the same eigenvalues as we calculated. proc factor method=prinit; which gives this (edited): Iter Change Communalities 1 0.086223 0.68622 0.67111 0.66161 0.59448 0.59577 2 0.034169 0.72039 0.69372 0.67868 0.59118 0.59373 3 0.015227 0.73562 0.70041 0.68186 0.58915 0.59295 4 0.007750 0.74337 0.70202 0.68116 0.58780 0.59285 5 0.004443 0.74781 0.70210 0.67971 0.58682 0.59312 6 0.002776 0.75059 0.70180 0.67839 0.58602 0.59358 7 0.001828 0.75242 0.70147 0.67738 0.58532 0.59413 8 0.001239 0.75365 0.70119 0.67664 0.58467 0.59474 9 0.000853 0.75451 0.70097 0.67612 0.58405 0.59537 Convergence criterion satisfied. Eigenvalues of the Reduced Correlation Matrix: Total = 3.3108913 Average = 0.66217826 1 2 3 Eigenvalue 2.2823 1.0287 0.0248 83 84 4 5 Eigenvalue -0.0005-0.0245 Initial Factor Method: Iterated Principal Factor Analysis Factor Pattern PARA 0.83438-0.24149 SENT 0.82551-0.13968 WORD 0.79021-0.22734 ADD 0.41275 0.64318 DOTS 0.33150 0.69676 Variance explained by each factor 2.282343 1.028673 Final Communality Estimates: Total = 3.311016 PARA SENT WORD ADD DOTS 0.754508 0.700974 0.676119 0.584049 0.595367 This is the same (numerically) as the text and our previous calculation. Mathematics of the common factor model With c factors, have model like X i = λ i1 ξ 1 + + λ ic ξ c + δ i for each i, i = 1, 2,..., p. Easier to write in matrix terms: X = ΞΛ c +. Assumptions: 1. Common factors ξ uncorrelated, variance 1: Ξ Ξ/(n 1) = I. 2. Specific factors δ uncorrelated with variances θii 2 : Θ = /(n 1) diagonal. 85 86
Invariance under rotation 3. Common and specific factors uncorrelated: Ξ = 0. Since X i standardized, correlation matrix R = X X/(n 1). Substitute for X, and remove any terms 0 by assumption. Finally R Θ = Λ c Λ c. In principal components, choose each component to maximize variance (while being uncorrelated with previous components). But here, no such restriction: any matrix Λ c satisfying equation is OK. Consider orthogonal matrix T, representing rotation. Let Λ c = Λ c T. Then Λ cλ c = Λ c T T Λ c = Λ c Λ c, because T T = I. That is, for any matrix of factor loadings solving the problem, any rotated version of it also solves the problem. In principal components, difficult to interpret medium-sized component loading. Idea: find rotation method that makes factors easy to interpret. 87 88 Kaiser s varimax rotation Want factor loadings (elements of rotated Λ c ) to be close to 0 or 1. Then each factor clearly depends (or does not depend) on each variable. Varimax: find rotation that maximizes sum of column variances. Maximizing variances drives values towards extremes. In SAS, change FACTOR line to read proc factor method=prinit rotate=varimax; Can eliminate PRIORS line. Results: Rotation Method: Varimax Orthogonal Transformation Matrix 1 2 1 0.93037 0.36663 2-0.36663 0.93037 Rotated Factor Pattern PARA 0.86556 0.08098 SENT 0.81899 0.17284 WORD 0.81804 0.07868 ADD 0.14966 0.73801 DOTS 0.05112 0.78274 Variance explained by each factor 2.114135 1.199955 89 90
Factor 1 now clearly depends on the three verbal tests (paragraph comprehension, sentence completion, word meaning); factor 2 now clearly based on numerical tests (addition, counting dots). Variance explained by 1st factor now no longer largest possible, but 2 factors together still explain same amount of variance. Quartimax rotation Similar idea to varimax: but now maximize total row variance. Makes each variable load on as few factors as possible. (In varimax, could still have variable appearing in several different factors.) In example data set, results very similar (because each variable only loaded on one factor anyway in varimax). 91 92 Factor scores In principal components, obtained component scores: values for each observation representing where that observation falls on each component. Provided way to plot multidimensional data in 2 dimensions. Used component loadings to make linear combinations of original variables. Same idea in factor analysis, but difficulty: don t know specific factors exactly. So estimate factor scores by assuming specific factor δ i = 0: Ξ = XR 1 Λ c. Λ c depends on rotation, therefore factor scores do too. Saving and plotting factor scores Factor scores depend on original observations, so need original data not just correlation matrix. Data in file stock_returns.dat are weekly rates of return for 5 stocks on NYSE: Allied Chemical, du Pont, Union Carbide, Exxon, Texaco. Collected over 100 weeks. SAS FACTOR line: proc factor method=prinit rotate=varimax priors=smc nfact Variations: priors command specifies communalities that should be closer to truth; nfactors says to get 2 factors; out says to create output dataset with factor scores. 93 94
Factor pattern before rotation: ALLCHEM 0.69735-0.07271 DUPONT 0.77456-0.42988 UNIONCAR 0.71597-0.11548 EXXON 0.61090 0.19638 TEXACO 0.69357 0.49941 Variance explained by each factor 2.453094 0.491398 Factor 1 basically average of all; factor 2 contrasts Texaco with du Pont. Factor 1 explains most of variance. Compare pattern after rotation: ALLCHEM 0.57787 0.39706 DUPONT 0.86833 0.17533 UNIONCAR 0.61978 0.37659 EXXON 0.33751 0.54576 TEXACO 0.20384 0.83001 Variance explained by each factor 1.627511 1.316981 Factor 1 picks out first 3 (chemical) companies; factor 2 picks out last 2 (oil). Explained variance shared out more evenly between factors. 95 96 Can print out new dataset (including factor scores) and plot factor scores. (SAS by default uses newest dataset.) proc print; proc plot; plot Factor1 * Factor2; On plot, pick out good/bad days for chemical (top/bottom), good/bad days for oil (right/left). Example: day 13. Factor 1 score 2.64, factor 2 score 0.24. Good day for chem companies: du Pont, Union Carbide big gains. Average day for oil: small gains for both Exxon, Texaco. Day 20: factor 1 1.11, factor 2 2.78. No gain for chem companies, both Exxon, Texaco solid gain. 97