Factor Analysis. Factor Analysis

Similar documents

Multivariate Analysis (Slides 13)

Exploratory Factor Analysis

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Factor analysis. Angela Montanari

FACTOR ANALYSIS NASC

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Factor Analysis. Chapter 420. Introduction

Smith Barney Portfolio Manager Institute Conference

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Data Mining: Algorithms and Applications Matrix Math Review

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

Common factor analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Introduction to Matrix Algebra

Factor Analysis. Sample StatFolio: factor analysis.sgp

Least Squares Estimation

Analysing equity portfolios in R

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Econometrics Simple Linear Regression

Analyzing Structural Equation Models With Missing Data

Chapter 6. Orthogonality

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Chapter 7 Factor Analysis SPSS

Statistics for Business Decision Making

STA 4107/5107. Chapter 3

Principal Component Analysis

Overview of Factor Analysis

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

Simple Linear Regression Inference

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Psychology 7291, Multivariate Analysis, Spring SAS PROC FACTOR: Suggestions on Use

Lecture 3: Linear methods for classification

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Introduction to General and Generalized Linear Models

Trading activity as driven Poisson process: comparison with empirical data

Estimating an ARMA Process

Read chapter 7 and review lectures 8 and 9 from Econ 104 if you don t remember this stuff.

[1] Diagonal factorization

DATA ANALYSIS II. Matrix Algorithms

Poisson Models for Count Data

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Sections 2.11 and 5.8

T-test & factor analysis

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Efficiency and the Cramér-Rao Inequality

Linear Algebra Review. Vectors

Introduction to Principal Components and FactorAnalysis

Multivariate Analysis

Random effects and nested models with SAS

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Factor Analysis - 2 nd TUTORIAL

Introduction: Overview of Kernel Methods

Statistical Machine Learning from Data

Notes on Applied Linear Regression

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

13. Poisson Regression Analysis

PRINCIPAL COMPONENT ANALYSIS

Multivariate normal distribution and testing for means (see MKB Ch 3)

Linear Models for Continuous Data

Introduction to Principal Component Analysis: Stock Market Values

Notes on Symmetric Matrices

Topic 10: Factor Analysis

5. Orthogonal matrices

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Least-Squares Intersection of Lines

Similarity and Diagonalization. Similar Matrices

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

5.2 Customers Types for Grocery Shopping Scenario

Statistical Machine Learning

Lesson 5 Save and Invest: Stocks Owning Part of a Company

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Multivariate Analysis of Variance (MANOVA): I. Theory

Simple Regression Theory II 2010 Samuel L. Baker

Introduction to Fixed Effects Methods

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

5. Linear Regression

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Linear Programming. March 14, 2014

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Estimation of σ 2, the variance of ɛ

Transcription:

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we assume that such latent variables, or factors, exist. NC STATE UNIVERSITY 1 / 38

The Orthogonal Factor Model equation: X 1 µ 1 = l 1,1 F 1 + l 1,2 F 2 + + l 1,m F m + ɛ 1, X 2 µ 2 = l 2,1 F 1 + l 2,2 F 2 + + l 2,m F m + ɛ 2,.. X p µ p = l p,1 F 1 + l p,2 F 2 + + l p,m F m + ɛ p, where: F1, F 2,..., F m are the common factors (latent variables); li,j is the loading of variable i, X i, on factor j, F j ; ɛi is a specific factor, affecting only X i. NC STATE UNIVERSITY 2 / 38

In matrix form: X µ = L F + ɛ. p 1 p 1 p m 1 p 1 To make this identifiable, we further assume, with no loss of generality: E(F) = 0 m 1 Cov(F) = I m m E(ɛ) = 0 p 1 Cov(ɛ, F) = 0 p m NC STATE UNIVERSITY 3 / 38

and with serious loss of generality: Cov(ɛ) = Ψ = diag (ψ 1, ψ 2,..., ψ p ). In terms of the observable variables X, these assumptions mean that E(X) = µ, Cov(X) = Σ = L L p m p + Ψ p p. Usually X is standardized, so Σ = R. The observable X and the unobservable F are related by Cov(X, F) = L. NC STATE UNIVERSITY 4 / 38

Some terminology: the (i, i) entry of the matrix equation Σ = LL + Ψ is or where σ i,i }{{} Var(X i ) is the i th communality. = li,1 2 + li,2 2 + + li,m 2 } {{ } Communality σ i,i = h 2 i + ψ i h 2 i = l 2 i,1 + l 2 i,2 + + l 2 i,m + ψ }{{} i, Specific variance Note that if T is (m m) orthogonal, then (LT)(LT) = LL, so loadings LT generate the same Σ as L: loadings are not unique. NC STATE UNIVERSITY 5 / 38

Existence of Factor Representation For any p, every (p p) Σ can be factorized as Σ = LL for (p p) L, which is a factor representation with m = p and Ψ = 0; however, m = p is not much use we usually want m p. For p = 3, every (3 3) Σ can be represented as Σ = LL + Ψ for (3 1) L, which is a factor representation with m = 1, but Ψ may have negative elements. NC STATE UNIVERSITY 6 / 38

In general, we can only approximate Σ by LL + Ψ. Principal components method: the spectral decomposition of Σ is with m = p. Σ = EΛE = ( EΛ 1/2) ( EΛ 1/2) = LL If λ 1 + λ 2 + + λ m λ m+1 + + λ p, and L (m) is the first m columns of L, then Σ L (m) L (m) gives such an approximation with Ψ = 0. NC STATE UNIVERSITY 7 / 38

The remainder term Σ L (m) L (m) is non-negative definite, so its diagonal entries are non-negative we can get a closer approximation as Σ L (m) L (m) + Ψ (m), ( where Ψ (m) = diag Σ L (m) L (m) ). SAS proc factor program and output: proc factor data = all method = prin; var cvx -- xom; title Method = Principal Components ; proc factor data = all method = prin nfact = 2 plot; var cvx -- xom; title Method = Principal Components, 2 factors ; NC STATE UNIVERSITY 8 / 38

Principal Factor Solution Recall the Orthogonal Factor Model which implies X = LF + ɛ Σ = LL + Ψ. The m-factor Principal Component solution is to approximate Σ (or, if we standardize the variables, R) by a rank-m matrix using the spectral decomposition Σ = λ 1 e 1 e 1 + + λ m e m e m + λ m+1 e m+1 e m+1 + + λ p e p e p. The first m terms give the best rank-m approximation to Σ. NC STATE UNIVERSITY 9 / 38

We can sometimes achieve higher communalities (= diag (LL )) by either: specifying an initial estimate of the communalities iterating the solution or both. Suppose we are working with R. Given initial communalities hi 2, form the reduced correlation matrix h1 2 r 1,2... r 1,p r 2,1 h2 2... r 2,p R r =........ r p,1 r.. p,2 h 2 p NC STATE UNIVERSITY 10 / 38

Now use the spectral decomposition of R r to find its best rank-m approximation R r L r L r. New communalities are h 2 i = m j=1 Find Ψ by equating the diagonal terms: l 2 i,j. ψ i = 1 h 2 i, or Ψ = I diag ( L r L r ). NC STATE UNIVERSITY 11 / 38

This is the Principal Factor solution. The Principal Component solution is the special case where the initial communalities are all 1. In proc factor, use method = prin as for the Principal Component solution, but also specify the initial communalities: the priors =... option on the proc factor statement specifies a method, such as squared multiple correlations (priors = SMC); the priors statement provides explicit numerical values. NC STATE UNIVERSITY 12 / 38

SAS program and output: proc factor data = all method = prin priors = smc; title Method = Principal Factors ; var cvx -- xom; In this case, the communalities are smaller than for the Principal Component solution. NC STATE UNIVERSITY 13 / 38

Other choices for the priors option include: MAX maximum absolute correlation with any other variable; ASMC Adjusted SMC (adjusted to make their sum equal to the sum of the maximum absolute correlations); ONE 1; RANDOM uniform on (0, 1). NC STATE UNIVERSITY 14 / 38

Iterated Principal Factors One issue with both Principal Components and Principal Factors: if S or R is exactly in the form LL + Ψ (or, more likely, approximately in that form), neither method produces L and Ψ (unless you specify the true communalities). Solution: iterate! Use the new communalities as initial communalities to get another set of Principal Factors. Repeat until nothing much changes. NC STATE UNIVERSITY 15 / 38

In proc factor, use method = prinit; may also specify the initial communalities (default = ONE). SAS program and output: proc factor data = all method = prinit; title Method = Iterated Principal Factors ; var cvx -- xom; The communalities are still smaller than for the Principal Component solution, but larger than for Principal Factors. NC STATE UNIVERSITY 16 / 38

Likelihood Methods If we assume that X N p (µ, Σ) with Σ = LL + Ψ, we can fit by maximum likelihood: ˆµ = x; L is not identified without a constraint (uniqueness condition) such as L Ψ 1 L = diagonal; still no closed form equation for ˆL; numerical optimization required. NC STATE UNIVERSITY 17 / 38

We can also test hypotheses about m with the likelihood ratio test (Bartlett s correction improves the χ 2 approximation): H0 : m = m 0 ; H A : m > m 0 ; ] 2 log likelihood ratio χ 2 with 1 2 [(p m 0 ) 2 p m 0 degrees of freedom. ( Degrees of freedom > 0 m0 < 1 ) 2 2p + 1 8p + 1. E.g. for p = 5, m 0 < 2.298 m 0 2: p m 0 degrees of freedom 5 0 10 5 1 5 5 2 1 NC STATE UNIVERSITY 18 / 38

In proc factor, use method = ml; may also specify the initial communalities (default = SMC); SAS program and output: proc factor data = all method = ml; var cvx -- xom; title Method = Maximum Likelihood ; proc factor data = all method = ml heywood plot; var cvx -- xom; title Method = Maximum Likelihood with Heywood fixup ; proc factor data = all method = ml ultraheywood plot; var cvx -- xom; title Method = Maximum Likelihood with Ultra-Heywood fixup ; NC STATE UNIVERSITY 19 / 38

Note that the iteration can produce communalities > 1! Two fixes: use the Heywood option on the proc factor statement; caps the communalities at 1; use the UltraHeywood option on the proc factor statement; allows the iteration to continue with communalities > 1. NC STATE UNIVERSITY 20 / 38

Scaling and the Likelihood If the maximum likelihood estimates for a data matrix X are ˆL and ˆΨ, and Y = X D n p n p p is a scaled data matrix, with the columns of X scaled by the entries of the diagonal matrix D, then the maximum likelihood estimates for Y are DˆL and D 2 ˆΨ. That is, the mle s are invariant to scaling: ˆΣ Y = D ˆΣ X D. NC STATE UNIVERSITY 21 / 38

Proof: L Y (µ, Σ) = L X (D 1 µ, D 1 ΣD 1 ). No distinction between covariance and correlation matrices. NC STATE UNIVERSITY 22 / 38

Weighting and the Likelihood Recall the uniqueness condition Write L Ψ 1 L =, diagonal. Σ = Ψ 1 2 ΣΨ 1 2 = Ψ 1 2 (LL + Ψ)Ψ 1 2 ) ( ) = (Ψ 1 2 L Ψ 1 2 L + Ip = L L + I p. Σ is the weighted covariance matrix. NC STATE UNIVERSITY 23 / 38

Here L = Ψ 1 2 L and L L = L Ψ 1 L =. Note: Σ L = L L L + L = L + L = L ( + I m ) so the columns of L are the (unnormalized) eigenvectors of Σ, the weighted covariance matrix. NC STATE UNIVERSITY 24 / 38

Also (Σ I p )L = L so the columns of L are also the eigenvectors of Σ I p = Ψ 1 2 (Σ Ψ)Ψ 1 2, the weighted reduced covariance matrix. Since the likelihood analysis is transparent to scaling, the weighted reduced correlation matrix gives essentially the same results as the weighted reduced covariance matrix. NC STATE UNIVERSITY 25 / 38

Factor Rotation In the orthogonal factor model X µ = LF + ɛ, factor loadings are not always easily interpreted. J&W (p 504): Ideally, we should like to see a pattern of loadings such that each variable loads highly on a single factor and has small to moderate loadings on the remaining factors. That is, each row of L should have a single large entry. NC STATE UNIVERSITY 26 / 38

Recall from the corresponding equation Σ = LL + Ψ that L and LT give the same Σ for any orthogonal T. We can choose T to make the rotated loadings LT more readily interpreted. Note that rotation changes neither Σ nor Ψ, and hence the communalities are also unchanged. NC STATE UNIVERSITY 27 / 38

The Varimax Criterion Kaiser proposed a criterion that measures interpretability: ˆL is some set of loadings with communalities ĥi 2, i = 1, 2,..., p; ˆL is a set of rotated loadings, ˆL = ˆLT; l i,j = ˆl i,j /ĥi are scaled loadings; criterion is ( V = 1 m p p ) 2 4 l i,j 1 2 l i,j. p p j=1 i=1 i=1 NC STATE UNIVERSITY 28 / 38

Note that the term in [ ]s is the variance of the l 2 i,j in column i. Making this variance large tends to produce two clusters of scaled loadings, one of small values and one of large values. So each column of the rotated loading matrix tends to contain: a group of large loadings, which identify the variables associated with the factor; the remaining loadings are small. NC STATE UNIVERSITY 29 / 38

Example: Weekly returns for the 30 Dow Industrials stocks from January, 2005 to March, 2007 (115 returns). R code to rotate Principal Components 2 10: dowprcomp = prcomp(dow, scale. = TRUE); dowvmax = varimax(dowprcomp$rotation[, 2:10], normalize = FALSE); loadings(dowvmax); Note: when R prints the loadings, entries with absolute value below a cutoff (default: 0.1) are printed as blanks, to draw attention to the larger values. NC STATE UNIVERSITY 30 / 38

Loadings: PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 AA 0.158-0.358-0.239-0.325 0.164 AIG -0.138-0.124 0.245 0.198-0.211 0.221 0.257 AXP -0.165-0.166-0.107 0.196-0.186 BA -0.382-0.171 0.118 CAT -0.132 0.408 0.139-0.155 0.158 C 0.161-0.203-0.218-0.277 0.152-0.187 DD -0.104-0.265 0.136 DIS 0.101-0.191 0.131-0.486 0.117 GE -0.139-0.327 0.255 0.101 0.180 GM 0.676 0.108 HD -0.120 0.175 0.306 0.151-0.144 HON -0.260 0.197 0.142 0.123 HPQ 0.664 IBM -0.192-0.155 0.125-0.245 0.222 INTC 0.134 0.161 0.138 0.128 JNJ -0.113 0.127-0.620 0.105 JPM 0.139 0.174-0.321-0.156 KO -0.110-0.371 0.217-0.154-0.129 NC STATE UNIVERSITY 31 / 38

MCD -0.634 0.111 MMM -0.105-0.618-0.131 MO -0.113-0.608 MRK 0.136 0.662 MSFT 0.108-0.123 0.123 0.498 PFE -0.138 0.505-0.120 0.189 PG 0.110-0.212 0.165-0.193-0.382-0.128 0.304-0.120 T 0.527 0.152 0.141 UTX -0.274 0.110 0.298-0.103-0.199 VZ 0.443 0.107 WMT 0.222 0.206 0.416-0.137-0.217 XOM 0.101-0.529 0.167 0.142 NC STATE UNIVERSITY 32 / 38

In proc factor, use rotate = varimax; may also request plots both before (preplot) and after (plot) rotation; SAS program and output: proc factor data = all method = prinit nfact = 2 rotate = varimax preplot plot out = stout; title Method = Iterated Principal Factors with Varimax Rotation ; var cvx -- xom; NC STATE UNIVERSITY 33 / 38

Factor Scores Interpretation of a factor analysis is usually based on the factor loadings. Sometimes we need the (estimated) values of the unobserved factors for further analysis the factor scores. In Principal Components Analysis, typically the principal components are used, scaled to have variance 1. In other types of factor analysis, two methods are used. NC STATE UNIVERSITY 34 / 38

Bartlett s Weighted Least Squares Suppose that in the equation L is known. X µ = LF + ɛ, We can view the equation as a regression of X on L, with coefficients F and heteroscedastic errors ɛ with variance matrix Ψ. This suggests using to estimate F. ˆf = ( L Ψ 1 L ) 1 L Ψ 1 (x µ) NC STATE UNIVERSITY 35 / 38

With L, Ψ, and µ replaced by estimates, and for the j th observation x j, this gives as estimated values of the factors. ˆf j = (ˆL ˆΨ 1ˆL) 1 ˆL ˆΨ 1 (x j x) The sample mean of the scores is 0. If the factor loadings are ML estimates, ˆL ˆΨ 1ˆL is a diagonal matrix ˆ, and the sample covariance matrix of the scores is n ( I + ˆ 1). n 1 In particular, the sample correlations of the factor scores are zero. NC STATE UNIVERSITY 36 / 38

Regression Method The second method depends on the normal distribution assumption. X and F have a joint multivariate normal distribution the conditional distribution of F given X is also multivariate normal. Best Linear Unbiased Predictor is the conditional mean. NC STATE UNIVERSITY 37 / 38

This leads to ˆfj = ˆL (ˆLˆL ˆΨ) 1 + (xj x) ( = I + ˆL ˆΨ 1ˆL) 1 ˆL ˆΨ 1 (x j x) The two methods are related by [ ) ] 1 ˆfLS j = I + (ˆL ˆΨ 1ˆL ˆfR j. In proc factor, use out = <data set name> on the proc factor statement; proc factor uses the regression method. NC STATE UNIVERSITY 38 / 38