Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Transcription

1

2 Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric.

3 Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal

4 Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n }

5 Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. weight height

6 Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. v1 w h

7 Recall: Secondary Directions The second eigenvector of a covariance matrix points in the direction, orthogonal to the first, that has the maximum variance. v2 w h

8 Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction.

9 Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction?

10 Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v1 w h

11 The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v2 w h

12 A New Basis Principal components provide us with a new orthogonal basis where the new coordinates are uncorrelated:

13 Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight)

14 Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight) In our new principal component basis, the total amount of variance has not changed: TotalVariation = λ 1 + λ 2

15 Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight) In our new principal component basis, the total amount of variance has not changed: TotalVariation = λ 1 + λ 2 The proportion of the variance directed along (explained by) the first component would then be: λ 1 λ 1 + λ 2 Likewise, for the second component: λ 2 λ 1 + λ 2

16 Total Variance Another way to state this fact is to use the theorem from Linear Algebra that says for any square matrix A, Trace(A) = Since A in our situation is the covariance matrix, the Trace(A) is the sum of the variances of each variable. n i=1 λ i

17 Total Variance No matter how many components we have, the proportion of variance explained by each component is the corresponding eigenvalue divided by the sum of the eigenvalues (or the total variance): Proportion Explained by Component i = λ i n j=1 λ j

18 Total Variance No matter how many components we have, the proportion of variance explained by each component is the corresponding eigenvalue divided by the sum of the eigenvalues (or the total variance): Proportion Explained by Component i = λ i n j=1 λ j The proportion of variance explained by the first k components is then the sum of the first k eigenvalues divided by the total variance: Proportion Explained by First k Components = k i=1 λ i n j=1 λ j

19 Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data?

20 Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component?

21 Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component? b. What is the variance of the second principal component?

22 Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component? b. What is the variance of the second principal component? c. What proportion of variance is captured by using both the first and second principal components?

23 Zero Eigenvalues What would it mean if λ 2 = 0?

24 Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero.

25 Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot.

26 Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot. Height and Weight must be perfectly correlated. w h

27 Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot. Height and Weight must be perfectly correlated. w h Data is essentially one-dimensional. v 1 alone explains 100% of the variation in height and weight.

28 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction

29 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component

30 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection...

31 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace...

32 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace......the span of the principal components...or eigenvectors

33 Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace......the span of the principal components...or eigenvectors #DimensionReduction

34 Screeplot Plot of the eigenvalues. Sometimes used to guess the number of latent components by finding an elbow in the curve.

35 Coordinates in the new basis To make the plot shown below, we need to know the coordinates of the data in the new basis. So for each observation, we need to solve: Observation i = α 1i v 1 + α 2i v 2 observation i

36 Coordinates in the new basis To find the coordinates in the new basis, we simply use the formulas for each Principal Component: ( ) 0.7 PC 1 = v 1 = = 0.7h + 0.7w 0.7 ( ) 0.7 PC 2 = v 2 = = 0.7h + 0.7w 0.7

37 Coordinates in the new basis To find the coordinates in the new basis, we simply use the formulas for each Principal Component: ( ) 0.7 PC 1 = v 1 = = 0.7h + 0.7w 0.7 ( ) 0.7 PC 2 = v 2 = = 0.7h + 0.7w 0.7 Thus, the new coordinates (called scores) are found in the matrix S: S = XV height weight obs 1 obs 2 ( PC 1 PC 2 ) h =. w obs n Where X contains centered (cov) or standardized (cor) data.

38 Summary of Output 3 major pieces of output:

39 Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings )

40 Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix

41 Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix Eigenvalues (Variances)

42 Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix Eigenvalues (Variances) Coordinates of Data in new basis Output dataset in SAS (out=...)

43 Correlation matrix vs. Covariance matrix PCA can be done using eigenvectors of either the covariance matrix or the correlation matrix. Default in SAS is correlation Default in R is covariance (at least for most packages - always check!!)

44 Correlation matrix vs. Covariance matrix Covariance PCA: Data is centered and directions of maximal variance are drawn. Use when scales of variables are not very different.

45 Correlation matrix vs. Covariance matrix Covariance PCA: Data is centered and directions of maximal variance are drawn. Use when scales of variables are not very different. Correlation PCA: Data is centered and normalized/standardized before directions of maximal variance are drawn. Use when scales of variables are very different.

46 Results may differ by a sign or constant factor!! SAS code: proc princomp data=iris out=irispc; var sepal_length sepal_width petal_length petal_width; run; Covariance option: proc princomp data=iris out=irispc cov; var sepal_length sepal_width petal_length petal_width; run;

47 SAS s strange problem To the best of my knowledge, SAS will not perform a typical PCA for datasets with fewer observations than variables. For our examples, we will use R.