Lecture 7: Principal component analysis (PCA) What is PCA? Why use PCA?

Similar documents
Principal Component Analysis

Common factor analysis

Factor Analysis. Chapter 420. Introduction

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

A Multivariate Statistical Analysis of Stock Trends. Abstract

Psychology 7291, Multivariate Analysis, Spring SAS PROC FACTOR: Suggestions on Use

Review Jeopardy. Blue vs. Orange. Review Jeopardy

A Brief Introduction to SPSS Factor Analysis

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Introduction to Principal Components and FactorAnalysis

Principal Component Analysis

Multivariate Analysis (Slides 13)

Introduction to Principal Component Analysis: Stock Market Values

T-test & factor analysis

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Dimensionality Reduction: Principal Components Analysis

United Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

Introduction to Matrix Algebra

Chapter 7 Factor Analysis SPSS

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

FACTOR ANALYSIS NASC

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis

Factor analysis. Angela Montanari

Overview of Factor Analysis

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

Factor Analysis. Sample StatFolio: factor analysis.sgp

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Multivariate Analysis

A Brief Introduction to Factor Analysis

Statistics for Business Decision Making

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Factor Rotations in Factor Analyses.

What is Rotating in Exploratory Factor Analysis?

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Similarity and Diagonalization. Similar Matrices

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

How to report the percentage of explained common variance in exploratory factor analysis

Factor Analysis. Factor Analysis

Coordinate Transformation

5.2 Customers Types for Grocery Shopping Scenario

Factor Analysis Using SPSS

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Least-Squares Intersection of Lines

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Using Principal Components Analysis in Program Evaluation: Some Practical Considerations

Exploratory Factor Analysis

Pythagorean Triples and Rational Points on the Unit Circle

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Factor Analysis: Statnotes, from North Carolina State University, Public Administration Program. Factor Analysis

Analysis of Crime Data using Principal Component Analysis: A case study of Katsina State

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Factor Analysis Using SPSS

Factor Analysis - 2 nd TUTORIAL

9.2 User s Guide SAS/STAT. The FACTOR Procedure. (Book Excerpt) SAS Documentation

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.

PRINCIPAL COMPONENT ANALYSIS

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

Chapter 6. Orthogonality

Multivariate Normal Distribution

Data Mining: Algorithms and Applications Matrix Math Review

Questionnaire Evaluation with Factor Analysis and Cronbach s Alpha An Example

A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis

Manifold Learning Examples PCA, LLE and ISOMAP

How To Run Factor Analysis

EXPLORATORY FACTOR ANALYSIS IN MPLUS, R AND SPSS. sigbert@wiwi.hu-berlin.de

STA 4107/5107. Chapter 3

by the matrix A results in a vector which is a reflection of the given

Canonical Correlation Analysis

Topic 10: Factor Analysis

Linear Algebra Review. Vectors

Module 3: Correlation and Covariance

Factor Analysis and Structural equation modelling

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Multivariate Statistical Inference and Applications

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

Regression III: Advanced Methods

Practical Considerations for Using Exploratory Factor Analysis in Educational Research

Assignment 9; Due Friday, March 17

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

Multivariate Analysis of Variance (MANOVA)

An important observation in supply chain management, known as the bullwhip effect,

Lecture 9: Introduction to Pattern Analysis

Measuring relative phase between two waveforms using an oscilloscope

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Research Methodology: Tools

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Orthogonal Diagonalization of Symmetric Matrices

Transcription:

Lecture 7: Princial comonent analysis (PCA) Rationale and use of PCA The underlying model (what is a rincial comonent anyway?) Eigenvectors and eigenvalues of the samle covariance matrix revisited! PCA scores and loadings The use of and rationale for rotations rthogonal and oblique rotations Comonent retention, significance, and reliability. L7. What is PCA? rom a set of Because the Z i s variables X, X,, X, (rincial comonents) we try and find are uncorrelated, they ( extract ) a set of measure different ordered indices Z, dimensions in the data. Z,, Z that are The hoe (sometimes uncorrelated and faint) is that most of the ordered in terms of variability in the original their variability: Var(Z ) set of variables will be > Var(Z ) > > Var(Z ) accounted for by c < comonents. L7. Why use PCA? PCA is generally used to reduce the number of variables considered in subsequent analyses, i.e. reduce the dimensionality of the data. Examles include: Reduce number of deendent variables in MANVA, mutivariate regression, correlation analysis, etc. Reduce number of indeendent variables (redictors) in regression analysis L7.3

Estimating rincial comonents The first rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the linear function Z = a X j j which maximizes Var(Z ), subject to: a j = The second rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the function Z = a X j j which maximizes Var(Z ), subject to: L7.4 a = Cov( Z Z j, ) = 0 Var( Z) Var( Z) Estimating rincial comonents (cont d) The third rincial comonent is obtained by fitting (i.e. estimating the coefficients of) the function Z = a X 3 3 j j which maximizes Var(Z 3 ), subject to: a3 j = as well as the additional constraints... Cov( Z, Z ) = 0 Cov( Z, Z ) = 0 Cov( Z, Z ) = 0 L7.5 and Var( Z3) Var( Z) Var( Z) 3 3 Estimating rincial comonents Estimation of the coefficients for each rincial comonent can be accomlished through several different methods (e.g. leastsquare estimation, maximum likelihood estimation, iterated rincial axis, etc.) The extracted rincial comonents may differ deending on the method of estimation. L7.6

The geometry of rincial comonents Princial comonents (Z i ) are linear functions of the original variables, and as such, define hyerlanes in the + - dimensional Z sace of Z and the original variables. Because the Z i s are uncorrelated, these lanes meet at right angles. X X Z X X L7.7 Multivariate variance: a geometric interretation Univariate variance is a measure of the volume occuied by samle oints in one dimension. Multivariate variance involving variables is the volume occuied by samle oints in an -dimensional sace. X Larger variance X ccuied volume X Smaller variance X L7.8 Multivariate variance: effects of correlations among variables X No correlation Correlations between airs of variables reduce the volume occuied by samle oints and hence, reduce the multivariate variance. ccuied volume X Positive correlation X Negative correlation X L7.9

C and the generalized multivariate variance L C = C N M Q P = 3 4 c o r = = 05. = cos θ, θ = 60 The determinant of the ss samle covariance matrix C is a generalized multivariate variance because area of a h arallelogram with sides θ s given by the individual standard deviations and s angle determined by the correlation between oosite h variables equals the sin 60 = = ; h = 3. hyotenuse determinant of C. Area = Base Height = 3, Area = C L7.0 Eigenvalues and eigenvectors of C No correlation Eigenvectors of the X covariance matrix C are orthogonal directed line segments that san the variation in the data, and the Positive X corresonding (unsigned) correlation eigenvalues are the length of these segments. X ξ so the roduct of the eigenvalues is the volume occuied by the data, i.e. the determinant of the covariance matrix. ξ X λ ξ ξ λ Negative correlation ξ ξ L7. The geometry of rincial comonents (cont d) The coefficients (a ij ) of the rincial comonents (Z i ) define vectors in the sace of coefficients. These vectors are the eigenvectors X (a i ) of the samle covariance matrix C, and the corresonding (unsigned) eigenvalues (λ i ) are the λ variances of each comonent, i.e. a Var(Z i )... 0 and the roduct of the eigenvalues is the volume occuied by the data, i.e. the - determinant of the covariance - 0 matrix. a X λ a a L7.

Another imortant relationshi! The sum of the eigenvalues of the covariance matrix C equals the sum of the diagonal elements of C, i.e. the trace of C. So, the sum of the variances of the rincial comonents equals the sum of the variances of the original variables. s = c C cm λ = i i= c c s m si i= c m cm sm = Tr( C) L7.3 Scale and the correlation matrix s = c C c m cm Since variables may be measured on different scales, and we want to cm cm sm eliminate scale effects, we ' X usually work with ik X c k ij X ik =, rij = standardized values so that sk sis j each variable is scaled to have zero mean and unit r r m variance. r r The samle covariance m C = R = matrix of standardized variables is the samle rm rm correlation matrix R. L7.4 c s Princial comonent scores Because rincial comonents are functions, we can lug in the values for each variable for each observation, and calculate a PC score for each observation and each rincial comonent. bservation X X 3.7.5.3 0. 0. 7 0. 97 a = 0. 9 0. 39 S =. 07( 3. 7) + 0. 97(. 5) S S S = 0. 9( 3. 7) + 0. 39(. 5) =. 07(. 3) + 0. 97( 0. ) = 0. 9(. 3) + 0. 39( 0. ) L7.5

Princial comonent loadings Comonent loadings (L ij ) are the covariances (correlations for standardized values) of the original variables used in the PCA with the comonents, and are roortional to the comonent coefficients (a ij ). or each comonent, the (loading) for each variable summed over all variables equals the variance of the comonent. L = Cov( X, Z ) ij L = ka ij i= ij ij j i L = λ = Var( Z ) j j L7.6 More on loadings Sometimes comonents have variables with similar loadings, which form a natural grou. To assist in interretation, we may want to choose another comonent frame which emhasizes these differences among grous. ACTR() R () Loadings Variable Z Z Height 0.85 0.37 Arm san 0.84 0.44 Lower leg 0.84 0.40 orearm 0.8 0.46 Weight 0.75 - Uer thigh 0.67-3 Chest width 0.67-0.4 Chest girth 0.6-8.0 - actor lot REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR -.0 -.0 -.0 ACTR() L7.7 rthogonal rotations: varimax rthogonal (angle A WEIGHT BITR - CHESTGIR reserving): new (rotated) comonents -.0 -.0 -.0 are still uncorrelated ACTR().0 WEIGHT CHESTGIR BITR Varimax: rotation done so that each LWERLEG HEIGHT comonent loads high R () REARM on a small number of variables and low on - Varimax other variables (simlifies -.0 factors) -.0 -.0 ACTR() ACTR() R () C T.0 unrotated ACTR() REARM LWERLEG HEIGHT L7.8

rthogonal rotations: quartimax rthogonal (angle reserving): new (rotated) comonents are still uncorrelated Varimax: rotation done so that each variable loads mainly on one factor (simlified variables) ACTR() ACTR() R () R ().0 - unrotated REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR -.0 -.0 -.0 ACTR().0 WEIGHT CHESTGIR BITR - Varimax LWERLEG HEIGHT REARM -.0 -.0 -.0 ACTR() L7.9 rthogonal rotations: Equamax rthogonal (angle reserving): new (rotated) comonents are still uncorrelated Equamax: Combines varimax and quartimax. Number of variables that load highly on a factor and the number of factors needed to exlain the variable are otimized. - ACTR() ACTR() R () R ().0 - -.0 -.0 -.0 ACTR().0 unrotated Equamax LWERLEG HEIGHT WEIGHT CHESTGIR BITR REARM WEIGHT BITR CHESTGIR LWERLEG HEIGHT REARM -.0 -.0 -.0 ACTR() L7.0 blique rotations, e.g. blimin blique (non-angle reserving): new (rotated) comonents are now correlated Most reasonable when significant intercorrelations among factors exist. ACTR() R () ACTR() R ().0 - -.0 -.0 -.0 ACTR().0 CHESTGIR WEIGHT BITR - unrotated blimin REARM LWERLEG HEIGHT WEIGHT BITR CHESTGIR HEIGHT LWERLEG REARM -.0 -.0 -.0 ACTR() L7.

The consequences of rotation Unrotated comonents are () uncorrelated; () ordered in terms of decreasing variance (i.e., Var(Z ) > Var (Z ) > ). rthogonally rotated comonents are () still uncorrelated, but () need not be ordered in terms of decreasing variance (e.g. for Varimax rotation). bliquely rotated comonents are () correlated; () unordered (in general). L7. The rotated attern matrix for obliquely rotated factors The elements of the matrix are analogous to standardized artial regression coefficients from a multile regression analysis. So each element quantifies the imortance of the variable in question to the comonent, once the effects of other variables are controlled. Rotated Pattern Matrix (BLIMIN, Gamma = 000) HEIGHT 0.909 60 0.957-7 REARM 0.953-48 LWERLEG 0.96 8 WEIGHT 54 0.897 BITR - 0.864 CHESTGIR -90 0.88 88 0.749 L7.3 The rotated structure matrix for obliquely rotated factors The elements of the rotated structure matrix are the simle correlations of the variable in question with the factor, i.e. the comonent loadings. or orthogonal factors, the factor attern and factor structure matrices are identical. Rotated Structure Matrix HEIGHT 0.933 0.363 0.935 0.45 REARM 0.950 0.396 LWERLEG 0.98 0.43 WEIGHT 0.44 0.9 BITR 0.40 0.787 CHESTGIR 0.36 0.860 0.90 0.843 L7.4

Which rotation is the best? bject: find the rotation which achieves the simlest structure among comonent loadings, thereby making interretation comaratively easy. Thurstone s criteria: for variables and m < comonents: () each comonent should have at least m nearzero loadings; () few comonents should have non-zero loadings on the same variable. L7.5 A final word on rotations You cannot say that any rotation is better than any other rotation from a statistical oint of view: all rotations are equally good statistically. Therefore, the choice among different rotations must be based on non-statistical grounds SAS STAT User s guide, Vol.,. 776. L7.6 How many comonents to retain in subsequent analysis? Kaiser rule: retain only comonents with eigenvalues >. Scree test: lot eigenvalues against their ordinal numbers, retain all comonents in stee decent art of the curve. Retain as many factors as required to account for a secified amount of the total variance (e.g. 85%) Eigenvalue e u a l n v E ige 5 4 3 Scree lot Kaiser threshold 0 0 3 4 5 6 7 8 9 Number of actors L7.7

More on interretation: the significance of loadings Since loadings are correlation coefficients (r), we can test the null that each correlation equals zero. But analytic estimates of standard errors are often too small, esecially for rotated loadings. So, as a rule of thumb, use double the critical value to test significance. E.g., for N = 00, r(α = ) = 0.86, so significant factors have loadings greater than (0.86). L7.8 Comonent reliability: rules of thumb The absolute or N > 50, magnitude and number comonents with at of loadings are crucial least 0 loadings > for determining 0.40 are reliable. reliability Comonents with at least 4 loadings > 0.60 or with at least 3 loadings > 0.80 are reliable. L7.9 PCA: the rocedure. Calculate samle covariance matrix or correlation matrix. If all variables are on same scale, use samle covariance matrix, otherwise use correlation matrix.. Run PCA to extract unrotated comonents ( initial extraction ). 3. Decide which comonents to use in subsequent analysis based on Kaiser rule, Scree lots, etc. 4. Based on (3), rerun analysis using different orthogonal and oblique rotations and comare using factor lots ( follow-u extraction ) L7.30

PCA: the rocedure (cont d) 5. or obliquely rotated comonents, calculate correlations among comonents. Small correlations suggest that orthogonal rotations are reasonable. 6. Evaluate statistical significance of comonent loadings obtained from best rotation. 7. Check comonent reliability by redoing stes () - (6) with another (indeendent) data set, and comare the comonent loadings obtained from the two data sets. Are they close? L7.3