Lecture 7: Princial comonent analysis (PCA) Rationale and use of PCA The underlying model (what is a rincial comonent anyway?) Eigenvectors and eigenvalues of the samle covariance matrix revisited! PCA scores and loadings The use of and rationale for rotations rthogonal and oblique rotations Comonent retention, significance, and reliability. L7. What is PCA? rom a set of Because the Z i s variables X, X,, X, (rincial comonents) we try and find are uncorrelated, they ( extract ) a set of measure different ordered indices Z, dimensions in the data. Z,, Z that are The hoe (sometimes uncorrelated and faint) is that most of the ordered in terms of variability in the original their variability: Var(Z ) set of variables will be > Var(Z ) > > Var(Z ) accounted for by c < comonents. L7. Why use PCA? PCA is generally used to reduce the number of variables considered in subsequent analyses, i.e. reduce the dimensionality of the data. Examles include: Reduce number of deendent variables in MANVA, mutivariate regression, correlation analysis, etc. Reduce number of indeendent variables (redictors) in regression analysis L7.3
Estimating rincial comonents The first rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the linear function Z = a X j j which maximizes Var(Z ), subject to: a j = The second rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the function Z = a X j j which maximizes Var(Z ), subject to: L7.4 a = Cov( Z Z j, ) = 0 Var( Z) Var( Z) Estimating rincial comonents (cont d) The third rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the function Z = a X 3 3 j j which maximizes Var(Z 3 ), subject to: a3 j = as well as the additional constraints... Cov( Z, Z ) = 0 Cov( Z, Z ) = 0 Cov( Z, Z ) = 0 L7.5 and Var( Z3) Var( Z) Var( Z) 3 3 Estimating rincial comonents Estimation of the coefficients for each rincial comonent can be accomlished through several different methods (e.g. leastsquare estimation, maximum likelihood estimation, iterated rincial axis, etc.) The extracted rincial comonents may differ deending on the method of estimation. L7.6
The geometry of rincial comonents Princial comonents (Z i ) are linear functions of the original variables, and as such, define hyerlanes in the + - dimensional Z sace of Z and the original variables. Because the Z i s are uncorrelated, these lanes meet at right angles. X X Z X X L7.7 Multivariate variance: a geometric interretation Univariate variance is a measure of the volume occuied by samle oints in one dimension. Multivariate variance involving variables is the volume occuied by samle oints in an -dimensional sace. X Larger variance X ccuied volume X Smaller variance X L7.8 Multivariate variance: effects of correlations among variables X No correlation Correlations between airs of variables reduce the volume occuied by samle oints and hence, reduce the multivariate variance. ccuied volume X Positive correlation X Negative correlation X L7.9
C and the generalized multivariate variance L C = C N M Q P = 3 4 c o r = = 05. = cos θ, θ = 60 The determinant of the ss samle covariance matrix C is a generalized multivariate variance because area of a h arallelogram with sides θ s given by the individual standard deviations and s angle determined by the correlation between oosite h variables equals the sin 60 = = ; h = 3. hyotenuse determinant of C. Area = Base Height = 3, Area = C L7.0 Eigenvalues and eigenvectors of C No correlation Eigenvectors of the X covariance matrix C are orthogonal directed line segments that san the variation in the data, and the Positive X corresonding (unsigned) correlation eigenvalues are the length of these segments. X ξ so the roduct of the eigenvalues is the volume occuied by the data, i.e. the determinant of the covariance matrix. ξ X λ ξ ξ λ Negative correlation ξ ξ L7. The geometry of rincial comonents (cont d) The coefficients (a ij ) of the rincial comonents (Z i ) define vectors in the sace of coefficients. These vectors are the eigenvectors X (a i ) of the samle covariance matrix C, and the corresonding (unsigned) eigenvalues (λ i ) are the λ variances of each comonent, i.e. a Var(Z i )... 0 and the roduct of the eigenvalues is the volume occuied by the data, i.e. the - determinant of the covariance - 0 matrix. a X λ a a L7.
Another imortant relationshi! The sum of the eigenvalues of the covariance matrix C equals the sum of the diagonal elements of C, i.e. the trace of C. So, the sum of the variances of the rincial comonents equals the sum of the variances of the original variables. s = c C cm λ = i i= c c s m si i= c m cm sm = Tr( C) L7.3 Scale and the correlation matrix s = c C c m cm Since variables may be measured on different scales, and we want to cm cm sm eliminate scale effects, we ' X usually work with ik X c k ij X ik =, rij = standardized values so that sk sis j each variable is scaled to have zero mean and unit r r m variance. r r The samle covariance m C = R = matrix of standardized variables is the samle rm rm correlation matrix R. L7.4 c s Princial comonent scores Because rincial comonents are functions, we can lug in the values for each variable for each observation, and calculate a PC score for each observation and each rincial comonent. bservation X X 3.7.5.3 0. 0. 7 0. 97 a = 0. 9 0. 39 S =. 07( 3. 7) + 0. 97(. 5) S S S = 0. 9( 3. 7) + 0. 39(. 5) =. 07(. 3) + 0. 97( 0. ) = 0. 9(. 3) + 0. 39( 0. ) L7.5
Princial comonent loadings Comonent loadings (L ij ) are the covariances (correlations for standardized values) of the original variables used in the PCA with the comonents, and are roortional to the comonent coefficients (a ij ). or each comonent, the (loading) for each variable summed over all variables equals the variance of the comonent. L = Cov( X, Z ) ij L = ka ij i= ij ij j i L = λ = Var( Z ) j j L7.6 More on loadings Sometimes comonents have variables with similar loadings, which form a natural grou. To assist in interretation, we may want to choose another comonent frame which emhasizes these differences among grous. ACTR() R () Loadings Variable Z Z Height 0.85 0.37 Arm san 0.84 0.44 Lower leg 0.84 0.40 orearm 0.8 0.46 Weight 0.75 - Uer thigh 0.67-3 Chest width 0.67-0.4 Chest girth 0.6-8.0 - actor lot REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR -.0 -.0 -.0 ACTR() L7.7 rthogonal rotations: varimax rthogonal (angle A WEIGHT BITR - CHESTGIR reserving): new (rotated) comonents -.0 -.0 -.0 are still uncorrelated ACTR().0 WEIGHT CHESTGIR BITR Varimax: rotation done so that each LWERLEG HEIGHT comonent loads high R () REARM on a small number of variables and low on - Varimax other variables (simlifies -.0 factors) -.0 -.0 ACTR() ACTR() R () C T.0 unrotated ACTR() REARM LWERLEG HEIGHT L7.8
rthogonal rotations: quartimax rthogonal (angle reserving): new (rotated) comonents are still uncorrelated Varimax: rotation done so that each variable loads mainly on one factor (simlified variables) ACTR() ACTR() R () R ().0 - unrotated REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR -.0 -.0 -.0 ACTR().0 WEIGHT CHESTGIR BITR - Varimax LWERLEG HEIGHT REARM -.0 -.0 -.0 ACTR() L7.9 rthogonal rotations: Equamax rthogonal (angle reserving): new (rotated) comonents are still uncorrelated Equamax: Combines varimax and quartimax. Number of variables that load highly on a factor and the number of factors needed to exlain the variable are otimized. - ACTR() ACTR() R () R ().0 - -.0 -.0 -.0 ACTR().0 unrotated Equamax LWERLEG HEIGHT WEIGHT CHESTGIR BITR REARM WEIGHT BITR CHESTGIR LWERLEG HEIGHT REARM -.0 -.0 -.0 ACTR() L7.0 blique rotations, e.g. blimin blique (non-angle reserving): new (rotated) comonents are now correlated Most reasonable when significant intercorrelations among factors exist. ACTR() R () ACTR() R ().0 - -.0 -.0 -.0 ACTR().0 CHESTGIR WEIGHT BITR - unrotated blimin REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR HEIGHT LWERLEG REARM -.0 -.0 -.0 ACTR() L7.
The consequences of rotation Unrotated comonents are () uncorrelated; () ordered in terms of decreasing variance (i.e., Var(Z ) > Var (Z ) > ). rthogonally rotated comonents are () still uncorrelated, but () need not be ordered in terms of decreasing variance (e.g. for Varimax rotation). bliquely rotated comonents are () correlated; () unordered (in general). L7. The rotated attern matrix for obliquely rotated factors The elements of the matrix are analogous to standardized artial regression coefficients from a multile regression analysis. So each element quantifies the imortance of the variable in question to the comonent, once the effects of other variables are controlled. Rotated Pattern Matrix (BLIMIN, Gamma = 000) HEIGHT 0.909 60 0.957-7 REARM 0.953-48 LWERLEG 0.96 8 WEIGHT 54 0.897 BITR - 0.864 CHESTGIR -90 0.88 88 0.749 L7.3 The rotated structure matrix for obliquely rotated factors The elements of the rotated structure matrix are the simle correlations of the variable in question with the factor, i.e. the comonent loadings. or orthogonal factors, the factor attern and factor structure matrices are identical. Rotated Structure Matrix HEIGHT 0.933 0.363 0.935 0.45 REARM 0.950 0.396 LWERLEG 0.98 0.43 WEIGHT 0.44 0.9 BITR 0.40 0.787 CHESTGIR 0.36 0.860 0.90 0.843 L7.4
Which rotation is the best? bject: find the rotation which achieves the simlest structure among comonent loadings, thereby making interretation comaratively easy. Thurstone s criteria: for variables and m < comonents: () each comonent should have at least m nearzero loadings; () few comonents should have non-zero loadings on the same variable. L7.5 A final word on rotations You cannot say that any rotation is better than any other rotation from a statistical oint of view: all rotations are equally good statistically. Therefore, the choice among different rotations must be based on non-statistical grounds SAS STAT User s guide, Vol.,. 776. L7.6 How many comonents to retain in subsequent analysis? Kaiser rule: retain only comonents with eigenvalues >. Scree test: lot eigenvalues against their ordinal numbers, retain all comonents in stee decent art of the curve. Retain as many factors as required to account for a secified amount of the total variance (e.g. 85%) Eigenvalue e u a l n v E ige 5 4 3 Scree lot Kaiser threshold 0 0 3 4 5 6 7 8 9 Number of actors L7.7
More on interretation: the significance of loadings Since loadings are correlation coefficients (r), we can test the null that each correlation equals zero. But analytic estimates of standard errors are often too small, esecially for rotated loadings. So, as a rule of thumb, use double the critical value to test significance. E.g., for N = 00, r(α = ) = 0.86, so significant factors have loadings greater than (0.86). L7.8 Comonent reliability: rules of thumb The absolute or N > 50, magnitude and number comonents with at of loadings are crucial least 0 loadings > for determining 0.40 are reliable. reliability Comonents with at least 4 loadings > 0.60 or with at least 3 loadings > 0.80 are reliable. L7.9 PCA: the rocedure. Calculate samle covariance matrix or correlation matrix. If all variables are on same scale, use samle covariance matrix, otherwise use correlation matrix.. Run PCA to extract unrotated comonents ( initial extraction ). 3. Decide which comonents to use in subsequent analysis based on Kaiser rule, Scree lots, etc. 4. Based on (3), rerun analysis using different orthogonal and oblique rotations and comare using factor lots ( follow-u extraction ) L7.30
PCA: the rocedure (cont d) 5. or obliquely rotated comonents, calculate correlations among comonents. Small correlations suggest that orthogonal rotations are reasonable. 6. Evaluate statistical significance of comonent loadings obtained from best rotation. 7. Check comonent reliability by redoing stes () - (6) with another (indeendent) data set, and comare the comonent loadings obtained from the two data sets. Are they close? L7.3