Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Similar documents
Review Jeopardy. Blue vs. Orange. Review Jeopardy

Exploratory Factor Analysis

Dimensionality Reduction: Principal Components Analysis

Principal Component Analysis

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Common factor analysis

[1] Diagonal factorization

Introduction to Principal Components and FactorAnalysis

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

Orthogonal Diagonalization of Symmetric Matrices

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Linear Algebra Review. Vectors

x = + x 2 + x

A Demonstration of Hierarchical Clustering

Psychology 7291, Multivariate Analysis, Spring SAS PROC FACTOR: Suggestions on Use

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Linear Algebra Notes

MATH APPLIED MATRIX THEORY

CURVE FITTING LEAST SQUARES APPROXIMATION

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

by the matrix A results in a vector which is a reflection of the given

Introduction to Principal Component Analysis: Stock Market Values

Principal components analysis

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Similarity and Diagonalization. Similar Matrices

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

Introduction to Matrix Algebra

Module 3: Correlation and Covariance

Principal Component Analysis

Linear Discriminant Analysis

Chapter 6. Orthogonality

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Quadratic forms Cochran s theorem, degrees of freedom, and all that

PRINCIPAL COMPONENT ANALYSIS

Factor Analysis. Factor Analysis

Overview of Factor Analysis

Multivariate Analysis of Variance (MANOVA): I. Theory

Row Echelon Form and Reduced Row Echelon Form

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

DATA ANALYSIS II. Matrix Algorithms

FACTOR ANALYSIS NASC

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Multivariate Analysis (Slides 13)

Linearly Independent Sets and Linearly Dependent Sets

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Orthogonal Projections

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Nonlinear Iterative Partial Least Squares Method

How To Understand Multivariate Models

Session 7 Bivariate Data and Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Subspace Analysis and Optimization for AAM Based Face Alignment

Component Ordering in Independent Component Analysis Based on Data Power

Canonical Correlation Analysis

α = u v. In other words, Orthogonal Projection

MEASURES OF VARIATION

Partial Least Squares (PLS) Regression.

LS.6 Solution Matrices

Statistics for Business Decision Making

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

Data Mining: Algorithms and Applications Matrix Math Review

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Introduction to General and Generalized Linear Models

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Multivariate Normal Distribution

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Exploratory Factor Analysis

A tutorial on Principal Components Analysis

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Multiple regression - Matrices

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

How To Cluster

CALCULATIONS & STATISTICS

Similar matrices and Jordan form

Chapter 17. Orthogonal Matrices and Symmetries of Space

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Part 2: Analysis of Relationship Between Two Variables

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Factor Analysis - 2 nd TUTORIAL

THE DIMENSION OF A VECTOR SPACE

Financial Risk Management Exam Sample Questions/Answers

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Manifold Learning Examples PCA, LLE and ISOMAP

5.5. Solving linear systems by the elimination method

Least-Squares Intersection of Lines

STA 4107/5107. Chapter 3

Name: Section Registered In:

Factor analysis. Angela Montanari

How to report the percentage of explained common variance in exploratory factor analysis

Statistics 104: Section 6!

LINEAR ALGEBRA. September 23, 2010

Transcription:

Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric.

Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal

Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n }

Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. weight height

Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. v1 w h

Recall: Secondary Directions The second eigenvector of a covariance matrix points in the direction, orthogonal to the first, that has the maximum variance. v2 w h

Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction.

Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction?

Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v1 w h

The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v2 w h

A New Basis Principal components provide us with a new orthogonal basis where the new coordinates are uncorrelated:

Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight)

Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight) In our new principal component basis, the total amount of variance has not changed: TotalVariation = λ 1 + λ 2

Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight) In our new principal component basis, the total amount of variance has not changed: TotalVariation = λ 1 + λ 2 The proportion of the variance directed along (explained by) the first component would then be: λ 1 λ 1 + λ 2 Likewise, for the second component: λ 2 λ 1 + λ 2

Total Variance Another way to state this fact is to use the theorem from Linear Algebra that says for any square matrix A, Trace(A) = Since A in our situation is the covariance matrix, the Trace(A) is the sum of the variances of each variable. n i=1 λ i

Total Variance No matter how many components we have, the proportion of variance explained by each component is the corresponding eigenvalue divided by the sum of the eigenvalues (or the total variance): Proportion Explained by Component i = λ i n j=1 λ j

Total Variance No matter how many components we have, the proportion of variance explained by each component is the corresponding eigenvalue divided by the sum of the eigenvalues (or the total variance): Proportion Explained by Component i = λ i n j=1 λ j The proportion of variance explained by the first k components is then the sum of the first k eigenvalues divided by the total variance: Proportion Explained by First k Components = k i=1 λ i n j=1 λ j

Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data?

Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component?

Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component? b. What is the variance of the second principal component?

Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component? b. What is the variance of the second principal component? c. What proportion of variance is captured by using both the first and second principal components?

Zero Eigenvalues What would it mean if λ 2 = 0?

Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero.

Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot.

Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot. Height and Weight must be perfectly correlated. w h

Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot. Height and Weight must be perfectly correlated. w h Data is essentially one-dimensional. v 1 alone explains 100% of the variation in height and weight.

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection...

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace...

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace......the span of the principal components...or eigenvectors

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace......the span of the principal components...or eigenvectors #DimensionReduction

Screeplot Plot of the eigenvalues. Sometimes used to guess the number of latent components by finding an elbow in the curve.

Coordinates in the new basis To make the plot shown below, we need to know the coordinates of the data in the new basis. So for each observation, we need to solve: Observation i = α 1i v 1 + α 2i v 2 observation i

Coordinates in the new basis To find the coordinates in the new basis, we simply use the formulas for each Principal Component: ( ) 0.7 PC 1 = v 1 = = 0.7h + 0.7w 0.7 ( ) 0.7 PC 2 = v 2 = = 0.7h + 0.7w 0.7

Coordinates in the new basis To find the coordinates in the new basis, we simply use the formulas for each Principal Component: ( ) 0.7 PC 1 = v 1 = = 0.7h + 0.7w 0.7 ( ) 0.7 PC 2 = v 2 = = 0.7h + 0.7w 0.7 Thus, the new coordinates (called scores) are found in the matrix S: S = XV height weight obs 1 obs 2 ( PC 1 PC 2 ) h =. w obs n Where X contains centered (cov) or standardized (cor) data.

Summary of Output 3 major pieces of output:

Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings )

Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix

Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix Eigenvalues (Variances)

Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix Eigenvalues (Variances) Coordinates of Data in new basis Output dataset in SAS (out=...)

Correlation matrix vs. Covariance matrix PCA can be done using eigenvectors of either the covariance matrix or the correlation matrix. Default in SAS is correlation Default in R is covariance (at least for most packages - always check!!)

Correlation matrix vs. Covariance matrix Covariance PCA: Data is centered and directions of maximal variance are drawn. Use when scales of variables are not very different.

Correlation matrix vs. Covariance matrix Covariance PCA: Data is centered and directions of maximal variance are drawn. Use when scales of variables are not very different. Correlation PCA: Data is centered and normalized/standardized before directions of maximal variance are drawn. Use when scales of variables are very different.

Results may differ by a sign or constant factor!! SAS code: proc princomp data=iris out=irispc; var sepal_length sepal_width petal_length petal_width; run; Covariance option: proc princomp data=iris out=irispc cov; var sepal_length sepal_width petal_length petal_width; run;

SAS s strange problem To the best of my knowledge, SAS will not perform a typical PCA for datasets with fewer observations than variables. For our examples, we will use R.