# Factor analysis. Angela Montanari

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number of uncorrelated unobservable ones: the factors. The origin of factor analysis dates back to a work by Spearman in At that time psychometricians were deeply involved in the attempt to suitably quantify human intelligence, and Spearman s work provided a very clever and useful tool that is still at the bases of the most advanced instruments for measuring intelligence. Intelligence is the prototype of a vast class of variables that are not directly observed (they are called latent variables) but can be measured in an indirect way through the analysis of observable variables closely linked to the latent ones. Latent variables are common to many research fields besides psychology, from medicine to genetics, form finance to economics and this explain the still vivid interest towards Factor analysis. Spearman considered the following correlation matrix between children s examination performance in Classics (x 1 ), French (x 2 ) and English (x 3 ): R = He noticed a high positive correlation between the scores and hypothesized that it was due to the correlation of the three observed variables with a further unobserved variable that he called intelligence or general ability. If his assumption was true, than he expected that the partial correlation coefficients, computed between the observed variables after controlling for the common latent one, would vanish. 1

2 Starting from this intuition he formulated the following model, which, as we will see in the following, can perfectly fulfill the goal: x 1 = λ 1 f + u 1 x 2 = λ 2 f + u 2 x 3 = λ 3 f + u 3 where f is the underlying common factor, λ 1, λ 2, λ 3 are the factor loadings and u 1, u 2, u 3 are the unique or specific factors. The factor loadings indicate how much the common factor contributes to the different observed values of the x variables; the unique factors represent residuals, random noise terms that besides telling that an examination only offers an approximate measure of the subject s ability, also describe, for each individual, how much his result on a given subject, say French, differs from his general ability. Spearman s model can be generalized to include more than one common factor: x i = λ i1 f 1 + λ i2 f λ ik f k + + λ im f m + u i 2 The factor model Let x be a p-dimensional random vector with expected value µ and covariance matrix Σ. An m factor model for x holds if it can be decomposed as: x = Λf + u + µ If we assume to deal with mean centered x variables then, with no loss in generality, the model will be x = Λf + u (1) where Λ is the p m factor loading matrix; f is the m 1 random vector of common factors and u is the p 1 random vector of unique factors. The model looks like a linear regression model, but in this case all the elements in right hand side of the equal sign are unknown. In order to reduce indeterminacy we can impose the following constraints: E(f) = 0 E(u) = 0 this condition is perfectly coherent with the fact that we work with mean centered data 2

3 E(ff T ) = I i.e. the common factors are standardized uncorrelated random variables: their variances are equal to 1 and their covariances are 0. This assumption could also be relaxed, while the following ones are strictly required. where E(uu T ) = Ψ ψ Ψ = 0... ψ kk ψ pp is a diagonal matrix. This means that the unique factors are uncorrelated and may be heteroscedastic. E(fu T ) = 0 E(uf T ) = 0 that means that the unique factors are uncorrelated with the common ones. If a factor model satisfying the above mentioned conditions holds, the covariance matrix of the observed variables x can be decomposed as follows: Σ = E(xx T ) = E[(Λf + u)(λf + u) T ] = E(Λff T Λ T + Λfu T + uf T Λ T + uu T ] = ΛE(ff T )Λ T + ΛE(fu T ) + E(uf T )Λ T + E(uu T ) = ΛIΛ T + Λ0 + 0Λ T + Ψ = ΛΛ T + Ψ The opposite is also true: if the covariance matrix of the observed variables x can be decomposed as in equation (2) then the linear factor model (1) holds. As Ψ is diagonal, decomposition (2) clearly shows that the common factors account for all the observed covariances and provides theoretical motivation to Spearman s intuition. 3 (2)

4 It is worth having a closer look at the diagonal elements of the matrices on both sides of the equality sign in decomposition (2). V ar(x i ) = σ ii = m λ 2 ik + ψ ii = h 2 i + ψ ii k=1 The quantity m k=1 λ2 ik = h2 i is called called the communality; it represents the part of the variance of x i that is explained by the common factors or, in other words, it is the variance of x i that is shared with the other variables via the common factors. ψ ii is called the unique variance: it is the variance of the i-th unique factor and represents the part of the variance of x i not accounted for by the common factors. From the assumptions on the common and the unique factors that have been previously described a further interesting characterization for Λ derives. Let s consider the covariance between the observed variables x and the common factors f: Cov(x, f) = E(xf T ) = E[(Λf + u)f T ] = ΛE(ff T ) + E(uf T ) = Λ This means that the factor loading matrix Λ is also the covariance matrix between x and f. Scale equivariance The factor model (1) is scale equivariant. Let s consider the re-scaled x variables obtained as y = Cx where C is a diagonal matrix. Let s also assume that the linear factor model holds for x and put Λ = Λ x and Ψ = Ψ x. The factor model for the new y variables becomes: y = Cx = C(Λ x f + u) = CΛ x f + Cu = Λ y f + Cu Thus the factor model also holds for y. The new factor loading matrix is but a re-scaled version of the old one Λ y = CΛ x and the unique variance is Ψ y = E(Cuu T C) = CΨ x C. A common re-scaling transformation is standardization, that amounts to choose C = 1/2 where, as usual, is the diagonal matrix with the variances of the x variables. The scale equivariance property implies that we can define the model either on the raw or on the standardized data and it is always possible to go from one solution to the other by applying the same 4

5 scale change to the model parameters. It is worth reminding that, on the contrary, PCs don t share this property. Identifiability A statistical model is said to be identifiable if different values of the parameters generate different predicted values. In other words a statistical model is identifiable if a solution exists to the estimation problem and it is unique. Our linear factor model is not identifiable as there is an infinite number of different matrices Λ that can generate the same x values. In fact, if the m-factor model holds, then it also holds if the factors are rotated. Denote by G an m m orthogonal matrix such that GG T = G T G = I. The factor model can equivalently be re-written as x = Λf + u = ΛIf + u = ΛGG T f + u If we set Λ = ΛG and f = G T f, the factor model becomes x = Λ f + u. The two factor models are completely equivalent and indistinguishable. They have the same properties: E(f ) = E(G T f) = G T E(f) = 0 E(f f T ) = E(G T ff T G) = G T E(ff T )G = G T IG = G T G = I E(f u T ) = E(G T fu T ) = G T E(fu T ) = 0 In order to obtain a unique solution constraints on Λ and Ψ are usually imposed. The most common ones are that either Λ T Ψ 1 Λ or Λ T 1 Λ is diagonal. The property of rotation invariance will turn out to be really useful for interpretation purposes. The issue of the existence of the decomposition in equation (2) needs now to be addressed. The number of available informations is given by the distinct elements of Σ, i.e. the variances and the covariances whose number is equal to the sum of the first p natural numbers p(p + 1)/2. The number of unknown is given by the pm elements of Λ plus the p diagonal elements of Ψ. The above mentioned constraint reduces the number of unknowns of the quantity m(m 1)/2 i.e. the number of off-diagonal elements that are constrained to be equal to zero. The number of equations is therefore p(p+1)/2 and the number of unknowns is pm + p m(m 1)/2. If the first number is lower than the second we 5

6 have an infinite number of solutions. If equality holds then we have a unique exact solution that however is useless from a statistical point of view since it doesn t allow to explain the observed correlations through a small number of latent variables. If p(p+1)/2 > pm+p m(m 1)/2 then a solution, however approximate, exists. The previous inequality, known as Lederman condition, imposes an upper bound on the maximum number of common factors. 3 Estimation issues In practice we do not know Σ, but can estimate it by the sample covariance matrix S (or by the sample correlation matrix R if we want to deal with standardized data). We seek therefore the estimates ˆΛ and ˆΨ such that ˆΛ ˆΛ T + ˆΨ is as close to S as possible. Various estimation methods have been proposed over the years. Most of them lacked in theoretical bases, and that s one reason why factor analysis didn t have much success among the statical community and remained confined to the psychological literature. Only around 1960 Joreskog succeeded in deriving a sound and reliable algorithm for maximum likelihood estimation. This greatly helped the diffusion of factor analysis that, to day, is one of the most widely used dimension reduction models. We will briefly sketch one of the older heuristic methods, the Principal factor method, that is still often used and describe the best known Maximum likelihood method. Principal factor analysis This method is completely distribution free. It is derived from the knowledge of S or R only. As a first step, preliminary estimates h 2 i of the communalities h 2 i, i = 1,..., p are obtained. When S is the starting point, most softwares use s ii R 2 i0 or s ii max i r ii ; when working on R the corresponding expressions are simply R 2 i0 or max i r ii. The squared multiple correlation coefficient R 2 i0 is referred to a multiple regression model where the variable x i is considered as the dependent variable and the other x variables are the covariates. It measures the portion of the variance of x i that is explained by its linear relationship with the other vari- 6

7 ables. As the communality is the part of the variance of x i that is shared with the other variables via the common factors, R 2 i0 can provide a reasonable approximation. It can be easily proved that R 2 i0 can be computed from the correlation matrix R as 1 1/r ii where r ii is the i-th diagonal element of R 1. This implies that R is non singular. In case this condition is not satisfied the method based on max i r ii can be used. Let s then define the reduced covariance matrix as S ˆΨ (the reduced correlation matrix would be R ˆΨ) ; it is simply S (or R) where the diagonal elements have been replaced by h 2 i. By the spectral decomposition theorem, S ˆΨ can be decomposed as S ˆΨ = ΓLΓ T where Γ is the orthonormal matrix whose columns are the eigenvectors of S and L is the diagonal matrix of the eigenvalues. S ˆΨ is no longer positive semi definite. Suppose that the first m eigenvalues are positive. A rank m approximation of S ˆΨ can thus be obtained as S ˆΨ = Γ m L m Γ T m (3) where the columns of Γ m are the m eigenvectors of the reduced covariance matrix corresponding to the m positive eigenvalues, that appear on the diagonal of L m. Equation (3) can be rewritten as S ˆΨ = Γ m L m Γ T m = Γ m L 1/2 m L 1/2 m Γ T m = ˆΛ ˆΛ T where ˆΛ = Γ m L 1/2 m. The diagonal elements of ˆΛ ˆΛ T provide new estimates of the communalities. The estimation procedure can be stopped here or iterated, by replacing the new communality estimates on the diagonal of the reduced covariance matrix, until convergence. The sum of squares of the i-th row of ˆΛ is the communality of the i-th observed variable; the sum of squares of the k-th column of ˆΛ is the variance explained by the k-th common factor. If we devide it by the total variance we obtain an indication of the proportion of the total variance explained by the k-th factor. It is worth noticing that, being based on the spectral decomposition, the principal factor solution is not scale equivariant (we have already seen the same holds for PCA). This means that principal factoring 7

8 produces different results if we work on S or on R. Most statistical softwares perform Factor analysis on R as a default option. Their output also provides the residual correlation matrix, a heuristic tool aimed at assessing the goodness of fit of the m factor solution. It is computed as R ˆΛ ˆΛ T. It s diagonal elements are the estimates of the unique variances while the off- diagonal entries are the residual correlations i.e. the correlation that the common factors have not been able to account for. Values close to 0 indicate a good fit, values far from 0 indicate lack of fit. This might mean that more factors are need in order to better explain the observed correlation, but it also might mean that the relationship between observed variables and latent ones is not linear, as the model assumes. When working on R many softwares allow a third option for preliminary estimation of the communalities; it consists in setting all the communalities equal 1. This means that the unique variances are equal to 0. In this case the residual correlation matrix becomes R ˆΨ = R and the principal factors coincide with the Principal Components. This algebraic equivalence has led to a dangerous confusion between factor analysis and PCA. It must be remembered that PCA is a data transformation while Factor analysis assumes a model. Further more PCA aims at explaining variances while the main interest of Factor Analysis lies on covariances. We will deal with this aspect in more detail in the end of the chapter. Maximum likelihood factor analysis The approach based on maximum likelihood for the estimation of the model parameters assumes that the vector x is distributed according to a multivariate normal distribution, that is: f(x) = 2πΣ 1/2 exp{ 1 2 (x µ)t Σ 1 (x µ)} The likelihood for a sample of n units, is then L(x, µ, Σ) = 2πΣ n/2 exp{ 1 n (x j µ) T Σ 1 (x j µ)} 2 and the log-likelihood j=1 l(x, µ, Σ) = ln L(x, µ, Σ) = n 2 ln 2πΣ n (x j µ) T Σ 1 (x j µ) j=1

9 After estimating µ by x and after some algebra it becomes l(x, Σ) = n 2 ln 2πΣ n 2 trσ 1 S If the factor model holds then Σ = ΛΛ T +Ψ and the log-likelihood becomes: l(x, Σ) = l(x, Λ, Ψ) = n 2 ln 2π(ΛΛT + Ψ) n 2 tr(λλt + Ψ) 1 S This function must be maximized with respect to Λ and Ψ. This problem is not trivial and requires the recourse to numerical methods. In 1967 Joreskog developed an efficient two-stage algorithm. Convergence is quite fast, but it might happen that one or more elements of the estimate of Ψ become negative. This situation, known as Heywood case, can be solved by constraining the elements of Ψ to be non negative. As the maximum likelihood estimates are scale equivariant the estimates referred to standardized data can be obtained from the ones obtained on the raw data by a simple change in the scale. The same holds for the opposite transformation. Fitting the common factor model by maximum likelihood also allows to select, through a suitable likelihood ratio test, the number of common factors to be included in the model. The null hypothesis is H 0 : Σ = ΛΛ T + Ψ with Λ a p m matrix against the alternative H 1 : Σ is unstructured. The test statistic, after applying Bartlett s correction in order to improve its convergence to a χ 2 distribution, is: W = {n 2p m 3 }{ln ˆΛ ˆΛ T + ˆΨ ln S } where ˆΛ and ˆΨ are the maximum likelihood estimates of the corresponding model parameters. Under H 0, W is distributed according to a χ 2 with 1 2 [(p m)2 (p + m)] degrees of freedom. Examples on the use of FA and on the interpretation of the results will be provided in the notes for the lab session. 4 Rotation of factors In order to improve interpretability of the factor loadings we can relay on the invariance to orthogonal rotation property of the factor model. In

11 f k ) 2 is minimized. After differentiating with respect to a k and setting the derivatives equal to 0 we obtain E[2x(x T a k f k )] = 2[E(xx T )a k E(xf k )] = 2(Σa k Λ k ) = 0 where Λ k is the k-th column of Λ. Hence, a k = Σ 1 Λ k and f k = ΛT k Σ 1 x. It will then be f = Λ T Σ 1 x. After some algebra a different expression for f can be obtained. It is f = (I + Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x. The two formulations give the same result, but some text books present the second one only. Thompson s estimator is biased. In fact E(f f) = E(Λ T Σ 1 x f) = Λ T Σ 1 E(x f) = Λ T Σ 1 Λf f It has however been obtained by minimizing E(fk f k) 2 and so it is the estimator having the least mean squared prediction error. It is worth mentioning that Thompson s estimator is also known in the literature as regression estimator. If we assume that the observed variables x, the common factors f and the unique factors u are normally distributed and consider the vector y T = (f T, x T ) formed by compounding the common factors and the observed variables, under the factor model assumptions, we obtain that y is also distributed according to a 0 mean multivariate normal distribution with covariance matrix given by [ ] I Λ T Λ ΛΛ T + Ψ From the properties of the multivariate normal distribution, the expected value of f given x is E(f x) = Λ T (ΛΛ T + Ψ) 1 x = Λ T Σ 1 x that coincides with the previously obtained f. An alternative estimator is due to Bartlett. After the factor loadings and the unique variances have been estimated, the factor model can be seen as a linear multivariate regression model where f is the unknown vector parameter and the residuals (i.e. the unique factors) are uncorrelated but heteroscedastic. Estimation can be addressed by wighted least squares. 11

12 We want to find an estimate of f, say ˆf, such that u T Ψ 1 u = (x Λf) T Ψ 1 (x Λf) is minimum. After differentiating with respect to f and setting the derivatives equal to 0 we obtain and hence 2Λ T Ψ 1 (x Λf) = 2(Λ T Ψ 1 Λf Λ T Ψ 1 x) = 0 ˆf = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x Bartlett s estimator ˆf is unbiased. In fact E(ˆf f) = E{(Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x f} = = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 E(x f) = = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 Λf = f Of course it has larger mean squared prediction error than Thompson s estimator. The choice between the two is dependent on the research goals. 6 Factor analysis and PCA It is worth concluding this chapter by stressing the connections and the differences between Factor Analysis and PCA. Both methods have the aim of reducing the dimensionality of a vector of random variables. But while Factor Analysis assumes a model (that may fit the data or not), PCA is just a data transformation and for this reason it always exists. Furthermore while Factor Analysis aims at explaining (covariances) or correlations, PCA concentrates on variances. Despite these conceptual differences, there have been attempts, mainly in the past, to use PCA in order to estimate the factor model. In the following we will show that indeed PCA may be inadequate when the goal of the research is fitting a factor model. Let x be the usual p dimensional random vector and y the p dimensional vector of the corresponding principal components y = A T x with A the 12

13 orthonormal matrix whose columns are the eigenvectors of the covariance matrix of the x variables. Because of the properties of A it will also be x = Ay. A can be decomposed into two sub matrices A m containing the eigenvectors corresponding to the first m eigenvalues and A p m containing the remaining ones A = (A m A p m ). A similar partition is considered for the vector y, y m y = hence y p m x = (A m A p m ) = A m L 1/2 m L 1/2 m y m y p m = A m y m + A p m y p m y m + A p m y p m where L m is the diagonal matrix of the first m eigenvalues. Setting A m L 1/2 m = Λ, L 1/2 m y m = f and A p m y p m = η equation (4) can be rewritten as x = Λf + η. It can be easily checked that the new common factors f have the properties required by the linear factor model E(ff T ) = E(L 1/2 m y m y T ml 1/2 m ) = L 1/2 m E(y m y T m)l 1/2 m = L 1/2 m L m L 1/2 m = I because the covariance matrix of the first m PCS is L m and E(fη T ) = E(L 1/2 m y m y T p ma T p m) = L 1/2 m E(y m y T p m)a T p m = 0 because the first m and the last p m PCs are uncorrelated. The new unique factors η, however, are not uncorrelated and this contradicts the linear factor model assumption according to which the common factors completely explain the observed covariances: (4) E(ηη T ) = E(A p m y p m y T p ma T p m) = A p m E(y p m y T p m)a T p m = A p m L p m A T p m L p m is the covariance matrix of the last p m PCs and therefore it is diagonal; but, in general, its diagonal elements are different and therefore A p m L p m A T p m is not diagonal. 13

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Factor Analysis. Factor Analysis

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### How to report the percentage of explained common variance in exploratory factor analysis

UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Using the Singular Value Decomposition

Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

### Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

### A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Factor Rotations in Factor Analyses.

Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are

### Exploratory Factor Analysis

Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than

### 1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

### 1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

### Hypothesis Testing in the Classical Regression Model

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

### FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

### (57) (58) (59) (60) (61)

Module 4 : Solving Linear Algebraic Equations Section 5 : Iterative Solution Techniques 5 Iterative Solution Techniques By this approach, we start with some initial guess solution, say for solution and

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### T-test & factor analysis

Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

### by the matrix A results in a vector which is a reflection of the given

Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

### Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Lecture 5: Singular Value Decomposition SVD (1)

EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

### Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2

8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =

### EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Appendix D: Summary of matrix algebra Basic definitions A matrix is a rectangular array of numbers, with m

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

### Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

### Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?

1 Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances? André Beauducel 1 & Norbert Hilger University of Bonn,

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Exploratory Factor Analysis

Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### Similar matrices and Jordan form

Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Lecture 5 - Triangular Factorizations & Operation Counts

LU Factorization Lecture 5 - Triangular Factorizations & Operation Counts We have seen that the process of GE essentially factors a matrix A into LU Now we want to see how this factorization allows us

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### CS229 Lecture notes. Andrew Ng

CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

### 1 Orthogonal projections and the approximation

Math 1512 Fall 2010 Notes on least squares approximation Given n data points (x 1, y 1 ),..., (x n, y n ), we would like to find the line L, with an equation of the form y = mx + b, which is the best fit

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL

Appendix C Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL This note provides a brief description of the statistical background, estimators

### 1 Eigenvalues and Eigenvectors

Math 20 Chapter 5 Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors. Definition: A scalar λ is called an eigenvalue of the n n matrix A is there is a nontrivial solution x of Ax = λx. Such an x

### Lecture 9: Continuous

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 9: Continuous Latent Variable Models 1 Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets?

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### Bindel, Fall 2012 Matrix Computations (CS 6210) Week 8: Friday, Oct 12

Why eigenvalues? Week 8: Friday, Oct 12 I spend a lot of time thinking about eigenvalue problems. In part, this is because I look for problems that can be solved via eigenvalues. But I might have fewer

### Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

### Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Principal Component Analysis

Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

### Principal Component Analysis Application to images

Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables

### Solution of Linear Systems

Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

### Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

### Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing

The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in