Lecture 7: Analysis of Factors and Canonical Correlations

Similar documents
Factor Analysis. Factor Analysis

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Factor analysis. Angela Montanari

Multivariate Analysis (Slides 13)

Introduction to Matrix Algebra

Review Jeopardy. Blue vs. Orange. Review Jeopardy

FACTOR ANALYSIS NASC

Exploratory Factor Analysis

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

Factor Analysis. Chapter 420. Introduction

Principal Component Analysis

Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?

Exploratory Factor Analysis

Similarity and Diagonalization. Similar Matrices

[1] Diagonal factorization

Multivariate Analysis of Variance (MANOVA): I. Theory

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Dimensionality Reduction: Principal Components Analysis

Introduction to Principal Components and FactorAnalysis

Sections 2.11 and 5.8

Topic 10: Factor Analysis

Chapter 6. Orthogonality

STA 4107/5107. Chapter 3

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Multivariate Analysis of Variance (MANOVA)

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Overview of Factor Analysis

Common factor analysis

Chapter 7 Factor Analysis SPSS

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

T-test & factor analysis

DATA ANALYSIS II. Matrix Algorithms

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Linear Algebra Review. Vectors

Introduction: Overview of Kernel Methods

Multivariate Statistical Inference and Applications

Data Mining: Algorithms and Applications Matrix Math Review

Statistics for Business Decision Making

Psychology 7291, Multivariate Analysis, Spring SAS PROC FACTOR: Suggestions on Use

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Least-Squares Intersection of Lines

Department of Economics

Introduction to Principal Component Analysis: Stock Market Values

Factor Analysis. Sample StatFolio: factor analysis.sgp

Principal Component Analysis

How To Understand Multivariate Models

x = + x 2 + x

Orthogonal Diagonalization of Symmetric Matrices

by the matrix A results in a vector which is a reflection of the given

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Descriptive Statistics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Component Ordering in Independent Component Analysis Based on Data Power

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Statistical Machine Learning

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Chapter 6: Multivariate Cointegration Analysis

Partial Least Squares (PLS) Regression.

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

University of Lille I PC first year list of exercises n 7. Review

Subspace Analysis and Optimization for AAM Based Face Alignment

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Introduction to General and Generalized Linear Models

Inner product. Definition of inner product

Part 2: Community Detection

Module 3: Correlation and Covariance

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Section Inner Products and Norms

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Understanding and Applying Kalman Filtering

Factor Analysis - 2 nd TUTORIAL

Notes on Determinant

Some probability and statistics

The Characteristic Polynomial

Linear Algebra Notes

Lecture Notes 2: Matrices as Systems of Linear Equations

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Part 2: Analysis of Relationship Between Two Variables

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

How to report the percentage of explained common variance in exploratory factor analysis

Solving Systems of Linear Equations

4. Matrix Methods for Analysis of Structure in Data Sets:

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

The Bivariate Normal Distribution

DISCRIMINANT FUNCTION ANALYSIS (DA)

3.1 Least squares in matrix form

System Identification for Acoustic Comms.:

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

Multiple group discriminant analysis: Robustness and error rate

Lecture 1: Schur s Unitary Triangularization Theorem

Transcription:

1/33 Lecture 7: Analysis of Factors and Canonical Correlations Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 6/5 2011

2/33 Outline Factor analysis Basic idea: factor the covariance matrix Methods for factoring Canonical correlation analysis Basic idea: correlations between sets Finding the canonical correlations

3/33 Factor analysis Assume that we have a data set with many variables and that it is reasonable to believe that all these, to some extent, depend on a few underlying but unobservable factors. The purpose of factor analysis is to find dependencies on such factors and to use this to reduce the dimensionality of the data set. In particular, the covariance matrix is described by the factors.

4/33 Factor analysis: an early example C. Spearman (1904), General Intelligence, Objectively Determined and Measured, The American Journal of Psychology. Children s performance in mathematics (X 1 ), French (X 2 ) and English (X 3 ) was measured. Correlation matrix: R = Assume the following model: 1 0.67 0.64 1 0.67 1 X 1 = λ 1 f + ɛ 1, X 2 = λ 2 f + ɛ 2, X 3 = λ 3 f + ɛ 3 where f is an underlying common factor ( general ability ), λ 1, λ 2, λ 3 are factor loadings and ɛ 1, ɛ 2, ɛ 3 are random disturbance terms.

5/33 Factor analysis: an early example Model: X i = λ i f + ɛ i, i = 1, 2, 3 with the unobservable factor f = General ability The variation of ɛ i consists of two parts: a part that represents the extent to which an individual s matehematics ability, say, differs from her general ability a measurement error due to the experimental setup, since examination is only an approximate measure of her ability in the subject The relative sizes of λ i f and ɛ i tell us to which extent variation between individuals can be described by the factor.

6/33 Factor analysis: linear model In factor analysis, a linear model is assumed: X (p 1) L (p m) F (m 1) ɛ (p 1) X = µ + LF + ɛ = observable random vector = matrix of factor loadings = common factors; unobserved values of factors which describe major features of members of the population = errors/specific factors; measurement error and variation not accounted for by the common factors µ i = mean of variable i ɛ i = ith specific factor F j = jth common factor l ij = loading of the ith variable on the jth factor The above model differs from ordinary linear models in that the independent variables LF are unobservable and random.

7/33 Factor analysis: linear model We assume that the unobservable random vectors F and ɛ satisfy the following conditions: F and ɛ are independent. E(ɛ) = 0 and Cov(ɛ) = Ψ, where Ψ is a diagonal matrix. E(F) = 0 and Cov(F) = I. Thus, the factors are assumed to be uncorrelated. This is called the orthogonal factor model.

8/33 Factor analysis: factoring Σ Given the model X = µ + LF + ɛ, what is Σ? See blackboard! We find that Cov(X) = LL + Ψ where V(X i ) = l 2 i1 + + l 2 im + ψ i and Cov(X i, X k ) = l i1 l k1 + + l im l km. Furthermore, Cov(X, F) = L, so that Cov(X i, F j ) = l ij. If m = the number of factors is much smaller than p =the number of measured attributes, the covariance of X can be described by the pm elements of LL and the p nonzero elements of Ψ, rather than the (p 2 + p)/2 elements of Σ.

9/33 Factor analysis: factoring Σ The marginal variance σ ii can also be partitioned into two parts: The ith communality: The proportion of the variance at the ith measurement X i contributed by the factors F 1, F 2,..., F m. The uniqueness or specific variance: The remaining proportion of the variance of the ith measurement, associated with ɛ i. From Cov(X) = LL + Ψ we have that σ ii = l 2 i1 + l 2 i2 + + l 2 im + }{{} ψ }{{} i Communality,hi 2 Specific variance so σ ii = h 2 i + ψ i, i = 1, 2,..., p.

10/33 Orthogonal transformations Consider an orthogonal matrix T. See blackboard! We find that factor loadings L are determined only up to an orthogonal matrix T. With L = LT and F = T F, the pairs (L, F ) and (L, F) give equally valid decompositions. The communalities given by the diagonal elements of LL = (L )(L ) are also unaffected by the choice of T.

11/33 Factor analysis: estimation The factor model is determined uniquely up to an orthogonal transformation of the factors. How then can L and Ψ be estimated? Is some L = LT better than others? Idea: Find some L and Ψ and then consider various L. Methods for estimation: Principal component method Principal factor method Maximum likelihood method

12/33 Factor analysis: principal component method Let (λ i, e i ) be the eigenvalue-eigenvector pairs of Σ, with λ 1 λ 2... λ p > 0. From the spectral theorem (and the principal components lecture) we know that Σ = λ 1 e 1 e 1 + λ 2e 2 e 2 +... + λ pe p e p. Let L = ( λ 1 e 1, λ 2 e 2,..., λ p e p ). Then Σ = λ 1 e 1 e 1 +... + λ pe p e p = LL = LL + 0. Thus L is given by λ i times the coefficients of the principal components, and Ψ = 0.

Factor analysis: principal component method Now, if λ m+1, λ m+2,..., λ p are small, then the first m principal components explain most of Σ. Thus, with L m = ( λ 1 e 1,..., λ m e m ), Σ L m L m. With specific factors, this becomes where Ψ i = σ ii m i=1 l 2 ij. Σ L m L m + Ψ As estimators for the factor loadings and specific variances, we take ( L = L ) m = ˆλ 1 ê 1,..., ˆλ m ê m where (ˆλ i, ê i ) are the eigenvalue-eigenvector pairs of the sample covariance matrix S, and m 2 Ψ = s ii l ij. i=1 13/33

14/33 Factor analysis: principal component method In many cases, the correlation matrix R (which is also the covariance matrix of the standardized data) is used instead of S, to avoid problems related to measurements being in different scales. Example: consumer-preference data p. 491 in J&W. See R code! Is this a good example? Can factor analysis be applied to ordinal data? Example: stock price data p. 493 in J&W. See R code!

15/33 Factor analysis: other estimation methods Principal factor method: Modification of principal component approach. Uses initial estimates of the specific variances. Maximum likelihood method: Assuming normal data, the maximum likelihood estimators of L and Ψ are derived. In general the estimators must be calculated by numerical maximization of a function of matrices.

Factor analysis: factor rotation Given an orthogonal matrix T, a different but equally valid possible estimator of the factor loadings is L = LT. We would like to find a L that gives a nice and simple interpretation of the corresponding factors. Ideally: a pattern of loadings such that each variable loads highly on a single factor and has small to moderate loadings on the remaining factors. The varimax criterion: Define l ij = ˆl ij /ĥ i. This scaling gives variables with smaller communalities more influence. Select the orthogonal transformation T that makes ( V = 1 m p p ) 2 l 4 ij 1 l 2 ij p p i=1 i=1 i=1 as large as possible. 16/33

17/33 Factor analysis: factor rotation Example: consumer-preference data p. 508 in J&W. See R code! Example: stock price data p. 510 in J&W. See R code!

18/33 Factor analysis: limitations of orthogonal factor model The factor model is determined uniquely up to an orthogonal transformation of the factors. Linearity: the linear covariance approximation LL + Ψ may not be appropriate. The factor model is most useful when m is small, but in many cases mp + p parameters are not adequate and Σ is not close to LL + Ψ.

19/33 Factor analysis: further topics Some further topics of interest are: Tests for the number of common factors. Factor scores. Oblique factor model In which Cov(F) is not diagonal; the factors are correlated.

20/33 Canonical correlations Canonical correlation analysis CCA is a means of assessing the relationship between two sets of variables. The idea is to study the correlation between a linear combination of the variables in one set and a linear combination of the variables in another set.

21/33 Canonical correlations: the exams problem In a classical CCA problem exams in five topics are considered. Exams are closed-book (C) or open-book (O): Mechanics (C), Vectors (C), Algebra (O), Analysis (O), Statistics (O). Question: how highly correlated is a student s performance in closed-book exams with her performance in open-book exams?

22/33 Canonical correlations: basic idea Given a data set X, partition the collection of variables into two sets: X (1) (p 1) and X (2) (q 1). (Assume p q.) For these random vectors: E(X (1) ) = µ (1), Cov(X (1) ) = Σ 11 E(X (2) ) = µ (2), Cov(X (2) ) = Σ 22 and Cov(X (1), X (2) ) = Σ 12 = Σ 21.

23/33 Canonical correlations: basic idea Introduce the linear combinations U = a X (1), V = b X (2). For these, Var(U) = a Σ 11 a, Var(V ) = b Σ 22 b and Cov(U, V ) = a Σ 12 b. Goal: Seek a and b such that is as large as possible. Corr(U, V ) = a Σ 12 b a Σ 11 a b Σ 22 b

Canonical correlations: some definitions The first pair of canonical variables is the pair of linear combinations U 1 and V 1 having unit variances, which maximize the correlation Corr(U, V ) = a Σ 12 b a Σ 11 a b Σ 22 b The second pair of canonical variables is the pair of linear combinations U 2 and V 2 having unit variances, which maximize the correlation among all choices that are uncorrelated with the first pair of canonical variables. The kth pair of canonical variables is the pair of linear combinations U k and V k having unit variances, which maximize the correlation among all choices that are uncorrelated with the previous k 1 canonical variable pairs. The correlation between the kth pair of canonical variables is called the kth canonical correlation. 24/33

25/33 Canonical correlations: solution Result 10.1. The kth pair of canonical variables is given by We have that and U k = e k Σ 1/2 11 X (1) and V k = f k Σ 1/2 22 X (2). Var(U k ) = Var(V k ) = 1 Cov(U k, V k ) = ρ k, where ρ 2 1 ρ 2 2... ρ 2 p and (ρ 2 k, e k) and (ρ 2 k, f k) are the eigenvalue-eigenvectors pairs of Σ 1/2 11 Σ 12 Σ 1/2 22 Σ 21 Σ 1/2 11 and Σ 1/2 22 Σ 21 Σ 1/2 11 Σ 12 Σ 1/2 22, respectively.

26/33 Canonical correlations: solution Alternatively, the canonical variables coefficient vectors a and b and the corresponding correlations can be found by solving the eigenvalue equations Σ 1 11 Σ 12Σ 1 22 Σ 21 a = ρ 2 a Σ 1 22 Σ 21Σ 1 11 Σ 12 b = ρ 2 b This is often more practical to use when computing the coefficients and the correlations.

Canonical correlations: head-lengths of sons Example: Consider the two sons born first in n families. For these, consider the following measurements: X 1 : head length of first son; X 2 : head breadth of first son; X 3 : head length of second son; X 4 : head breadth of second son. Observations: x = 185.72 151.12 183.84 149.24, S = 91.481 50.753 66.875 44.267 52.186 49.259 33.651 96.775 54.278 43.222 Correlation matrices: [ 1 0.7346 R 11 = 0.7346 1 R 12 = R 21 = See R code! ] [, R 22 = [ 0.7107 0.7040 0.6931 0.7085 1 0.8392 0.8392 1 ] ] 27/33

28/33 Canonical correlations: head-lengths of sons The eigenvalues of R 1 11 R 12R 1 22 R 21 are 0.6218 and 0.0029. Thus the canonical correlations are: 0.6218 = 0.7886 and 0.0029 = 0.0539 (or are they? Remember to check signs!). From the eigenvectors we obtain the canonical correlation vectors : [ ] [ ] 0.727 0.704 a 1 =, a 0.687 2 = 0.710 and b 1 = [ 0.684 0.730 ] [, b 2 = 0.709 0.705 ] Note: The canonical correlation ˆρ 1 = 0.7886 exceeds any of the individual correlations between a variable of the first set and a variable of the second set.

29/33 Canonical correlations: head-lengths of sons The first canonical correlation variables are { U = 0.727x (1) 1 + 0.687x (1) 2 V = 0.684x (2) 1 + 0.730x (2) 2 with Corr(U, V ) = 0.7886. Interpretation: The sum of length and breadth of head size of each brother. These variables are highly negatively correlated between brothers.

30/33 Canonical correlations: head-lengths of sons The second canonical correlation variables are { U = 0.704x (1) 1 0.710x (1) 2 V = 0.709x (2) 1 0.705x (2) 2 with Corr(U, V ) = 0.0539. Interpretation: Seem to measure the difference between length and breadth. Head shape of first and second brothers appear to have little correlation.

Canonical correlations: poor summary of variability Example: Consider the following covariance matrix (p = q = 2): [ ] 100 0 0 0 Σ11 Σ Σ = 12 = 0 1.95 0 Σ 21 Σ 22 0.95 1 0 0 0 0 100 It can be shown that the first pair of canonical variates is with correlation U 1 = X (1) 2, V 1 = X (2) 1 ρ 1 = Corr(U 1, V 1 ) = 0.95. However, U 1 = X (1) 2 provides a very poor summary of the variability in the first set. Most of the variability in the first set is in X (1) 1, which is uncorrelated with U 1. (The same situation is true for V 1 = X (2) 1 in the second set.) 31/33

32/33 Canonical correlations: further topics Some further topics of interest are: Tests for Σ 12 = 0. Sample descriptive measures. Connection to MANOVA.

33/33 Summary Factor analysis Basic idea: factor the covariance matrix Methods for factoring Principal components Maximum likelihood Canonical correlation analysis Basic idea: correlations between sets Finding the canonical correlations Eigenvalues and eigenvectors