Factor analysis. Angela Montanari
|
|
|
- Angelina Jennings
- 10 years ago
- Views:
Transcription
1 Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number of uncorrelated unobservable ones: the factors. The origin of factor analysis dates back to a work by Spearman in At that time psychometricians were deeply involved in the attempt to suitably quantify human intelligence, and Spearman s work provided a very clever and useful tool that is still at the bases of the most advanced instruments for measuring intelligence. Intelligence is the prototype of a vast class of variables that are not directly observed (they are called latent variables) but can be measured in an indirect way through the analysis of observable variables closely linked to the latent ones. Latent variables are common to many research fields besides psychology, from medicine to genetics, form finance to economics and this explain the still vivid interest towards Factor analysis. Spearman considered the following correlation matrix between children s examination performance in Classics (x 1 ), French (x 2 ) and English (x 3 ): R = He noticed a high positive correlation between the scores and hypothesized that it was due to the correlation of the three observed variables with a further unobserved variable that he called intelligence or general ability. If his assumption was true, than he expected that the partial correlation coefficients, computed between the observed variables after controlling for the common latent one, would vanish. 1
2 Starting from this intuition he formulated the following model, which, as we will see in the following, can perfectly fulfill the goal: x 1 = λ 1 f + u 1 x 2 = λ 2 f + u 2 x 3 = λ 3 f + u 3 where f is the underlying common factor, λ 1, λ 2, λ 3 are the factor loadings and u 1, u 2, u 3 are the unique or specific factors. The factor loadings indicate how much the common factor contributes to the different observed values of the x variables; the unique factors represent residuals, random noise terms that besides telling that an examination only offers an approximate measure of the subject s ability, also describe, for each individual, how much his result on a given subject, say French, differs from his general ability. Spearman s model can be generalized to include more than one common factor: x i = λ i1 f 1 + λ i2 f λ ik f k + + λ im f m + u i 2 The factor model Let x be a p-dimensional random vector with expected value µ and covariance matrix Σ. An m factor model for x holds if it can be decomposed as: x = Λf + u + µ If we assume to deal with mean centered x variables then, with no loss in generality, the model will be x = Λf + u (1) where Λ is the p m factor loading matrix; f is the m 1 random vector of common factors and u is the p 1 random vector of unique factors. The model looks like a linear regression model, but in this case all the elements in right hand side of the equal sign are unknown. In order to reduce indeterminacy we can impose the following constraints: E(f) = 0 E(u) = 0 this condition is perfectly coherent with the fact that we work with mean centered data 2
3 E(ff T ) = I i.e. the common factors are standardized uncorrelated random variables: their variances are equal to 1 and their covariances are 0. This assumption could also be relaxed, while the following ones are strictly required. where E(uu T ) = Ψ ψ Ψ = 0... ψ kk ψ pp is a diagonal matrix. This means that the unique factors are uncorrelated and may be heteroscedastic. E(fu T ) = 0 E(uf T ) = 0 that means that the unique factors are uncorrelated with the common ones. If a factor model satisfying the above mentioned conditions holds, the covariance matrix of the observed variables x can be decomposed as follows: Σ = E(xx T ) = E[(Λf + u)(λf + u) T ] = E(Λff T Λ T + Λfu T + uf T Λ T + uu T ] = ΛE(ff T )Λ T + ΛE(fu T ) + E(uf T )Λ T + E(uu T ) = ΛIΛ T + Λ0 + 0Λ T + Ψ = ΛΛ T + Ψ The opposite is also true: if the covariance matrix of the observed variables x can be decomposed as in equation (2) then the linear factor model (1) holds. As Ψ is diagonal, decomposition (2) clearly shows that the common factors account for all the observed covariances and provides theoretical motivation to Spearman s intuition. 3 (2)
4 It is worth having a closer look at the diagonal elements of the matrices on both sides of the equality sign in decomposition (2). V ar(x i ) = σ ii = m λ 2 ik + ψ ii = h 2 i + ψ ii k=1 The quantity m k=1 λ2 ik = h2 i is called called the communality; it represents the part of the variance of x i that is explained by the common factors or, in other words, it is the variance of x i that is shared with the other variables via the common factors. ψ ii is called the unique variance: it is the variance of the i-th unique factor and represents the part of the variance of x i not accounted for by the common factors. From the assumptions on the common and the unique factors that have been previously described a further interesting characterization for Λ derives. Let s consider the covariance between the observed variables x and the common factors f: Cov(x, f) = E(xf T ) = E[(Λf + u)f T ] = ΛE(ff T ) + E(uf T ) = Λ This means that the factor loading matrix Λ is also the covariance matrix between x and f. Scale equivariance The factor model (1) is scale equivariant. Let s consider the re-scaled x variables obtained as y = Cx where C is a diagonal matrix. Let s also assume that the linear factor model holds for x and put Λ = Λ x and Ψ = Ψ x. The factor model for the new y variables becomes: y = Cx = C(Λ x f + u) = CΛ x f + Cu = Λ y f + Cu Thus the factor model also holds for y. The new factor loading matrix is but a re-scaled version of the old one Λ y = CΛ x and the unique variance is Ψ y = E(Cuu T C) = CΨ x C. A common re-scaling transformation is standardization, that amounts to choose C = 1/2 where, as usual, is the diagonal matrix with the variances of the x variables. The scale equivariance property implies that we can define the model either on the raw or on the standardized data and it is always possible to go from one solution to the other by applying the same 4
5 scale change to the model parameters. It is worth reminding that, on the contrary, PCs don t share this property. Identifiability A statistical model is said to be identifiable if different values of the parameters generate different predicted values. In other words a statistical model is identifiable if a solution exists to the estimation problem and it is unique. Our linear factor model is not identifiable as there is an infinite number of different matrices Λ that can generate the same x values. In fact, if the m-factor model holds, then it also holds if the factors are rotated. Denote by G an m m orthogonal matrix such that GG T = G T G = I. The factor model can equivalently be re-written as x = Λf + u = ΛIf + u = ΛGG T f + u If we set Λ = ΛG and f = G T f, the factor model becomes x = Λ f + u. The two factor models are completely equivalent and indistinguishable. They have the same properties: E(f ) = E(G T f) = G T E(f) = 0 E(f f T ) = E(G T ff T G) = G T E(ff T )G = G T IG = G T G = I E(f u T ) = E(G T fu T ) = G T E(fu T ) = 0 In order to obtain a unique solution constraints on Λ and Ψ are usually imposed. The most common ones are that either Λ T Ψ 1 Λ or Λ T 1 Λ is diagonal. The property of rotation invariance will turn out to be really useful for interpretation purposes. The issue of the existence of the decomposition in equation (2) needs now to be addressed. The number of available informations is given by the distinct elements of Σ, i.e. the variances and the covariances whose number is equal to the sum of the first p natural numbers p(p + 1)/2. The number of unknown is given by the pm elements of Λ plus the p diagonal elements of Ψ. The above mentioned constraint reduces the number of unknowns of the quantity m(m 1)/2 i.e. the number of off-diagonal elements that are constrained to be equal to zero. The number of equations is therefore p(p+1)/2 and the number of unknowns is pm + p m(m 1)/2. If the first number is lower than the second we 5
6 have an infinite number of solutions. If equality holds then we have a unique exact solution that however is useless from a statistical point of view since it doesn t allow to explain the observed correlations through a small number of latent variables. If p(p+1)/2 > pm+p m(m 1)/2 then a solution, however approximate, exists. The previous inequality, known as Lederman condition, imposes an upper bound on the maximum number of common factors. 3 Estimation issues In practice we do not know Σ, but can estimate it by the sample covariance matrix S (or by the sample correlation matrix R if we want to deal with standardized data). We seek therefore the estimates ˆΛ and ˆΨ such that ˆΛ ˆΛ T + ˆΨ is as close to S as possible. Various estimation methods have been proposed over the years. Most of them lacked in theoretical bases, and that s one reason why factor analysis didn t have much success among the statical community and remained confined to the psychological literature. Only around 1960 Joreskog succeeded in deriving a sound and reliable algorithm for maximum likelihood estimation. This greatly helped the diffusion of factor analysis that, to day, is one of the most widely used dimension reduction models. We will briefly sketch one of the older heuristic methods, the Principal factor method, that is still often used and describe the best known Maximum likelihood method. Principal factor analysis This method is completely distribution free. It is derived from the knowledge of S or R only. As a first step, preliminary estimates h 2 i of the communalities h 2 i, i = 1,..., p are obtained. When S is the starting point, most softwares use s ii R 2 i0 or s ii max i r ii ; when working on R the corresponding expressions are simply R 2 i0 or max i r ii. The squared multiple correlation coefficient R 2 i0 is referred to a multiple regression model where the variable x i is considered as the dependent variable and the other x variables are the covariates. It measures the portion of the variance of x i that is explained by its linear relationship with the other vari- 6
7 ables. As the communality is the part of the variance of x i that is shared with the other variables via the common factors, R 2 i0 can provide a reasonable approximation. It can be easily proved that R 2 i0 can be computed from the correlation matrix R as 1 1/r ii where r ii is the i-th diagonal element of R 1. This implies that R is non singular. In case this condition is not satisfied the method based on max i r ii can be used. Let s then define the reduced covariance matrix as S ˆΨ (the reduced correlation matrix would be R ˆΨ) ; it is simply S (or R) where the diagonal elements have been replaced by h 2 i. By the spectral decomposition theorem, S ˆΨ can be decomposed as S ˆΨ = ΓLΓ T where Γ is the orthonormal matrix whose columns are the eigenvectors of S and L is the diagonal matrix of the eigenvalues. S ˆΨ is no longer positive semi definite. Suppose that the first m eigenvalues are positive. A rank m approximation of S ˆΨ can thus be obtained as S ˆΨ = Γ m L m Γ T m (3) where the columns of Γ m are the m eigenvectors of the reduced covariance matrix corresponding to the m positive eigenvalues, that appear on the diagonal of L m. Equation (3) can be rewritten as S ˆΨ = Γ m L m Γ T m = Γ m L 1/2 m L 1/2 m Γ T m = ˆΛ ˆΛ T where ˆΛ = Γ m L 1/2 m. The diagonal elements of ˆΛ ˆΛ T provide new estimates of the communalities. The estimation procedure can be stopped here or iterated, by replacing the new communality estimates on the diagonal of the reduced covariance matrix, until convergence. The sum of squares of the i-th row of ˆΛ is the communality of the i-th observed variable; the sum of squares of the k-th column of ˆΛ is the variance explained by the k-th common factor. If we devide it by the total variance we obtain an indication of the proportion of the total variance explained by the k-th factor. It is worth noticing that, being based on the spectral decomposition, the principal factor solution is not scale equivariant (we have already seen the same holds for PCA). This means that principal factoring 7
8 produces different results if we work on S or on R. Most statistical softwares perform Factor analysis on R as a default option. Their output also provides the residual correlation matrix, a heuristic tool aimed at assessing the goodness of fit of the m factor solution. It is computed as R ˆΛ ˆΛ T. It s diagonal elements are the estimates of the unique variances while the off- diagonal entries are the residual correlations i.e. the correlation that the common factors have not been able to account for. Values close to 0 indicate a good fit, values far from 0 indicate lack of fit. This might mean that more factors are need in order to better explain the observed correlation, but it also might mean that the relationship between observed variables and latent ones is not linear, as the model assumes. When working on R many softwares allow a third option for preliminary estimation of the communalities; it consists in setting all the communalities equal 1. This means that the unique variances are equal to 0. In this case the residual correlation matrix becomes R ˆΨ = R and the principal factors coincide with the Principal Components. This algebraic equivalence has led to a dangerous confusion between factor analysis and PCA. It must be remembered that PCA is a data transformation while Factor analysis assumes a model. Further more PCA aims at explaining variances while the main interest of Factor Analysis lies on covariances. We will deal with this aspect in more detail in the end of the chapter. Maximum likelihood factor analysis The approach based on maximum likelihood for the estimation of the model parameters assumes that the vector x is distributed according to a multivariate normal distribution, that is: f(x) = 2πΣ 1/2 exp{ 1 2 (x µ)t Σ 1 (x µ)} The likelihood for a sample of n units, is then L(x, µ, Σ) = 2πΣ n/2 exp{ 1 n (x j µ) T Σ 1 (x j µ)} 2 and the log-likelihood j=1 l(x, µ, Σ) = ln L(x, µ, Σ) = n 2 ln 2πΣ n (x j µ) T Σ 1 (x j µ) j=1
9 After estimating µ by x and after some algebra it becomes l(x, Σ) = n 2 ln 2πΣ n 2 trσ 1 S If the factor model holds then Σ = ΛΛ T +Ψ and the log-likelihood becomes: l(x, Σ) = l(x, Λ, Ψ) = n 2 ln 2π(ΛΛT + Ψ) n 2 tr(λλt + Ψ) 1 S This function must be maximized with respect to Λ and Ψ. This problem is not trivial and requires the recourse to numerical methods. In 1967 Joreskog developed an efficient two-stage algorithm. Convergence is quite fast, but it might happen that one or more elements of the estimate of Ψ become negative. This situation, known as Heywood case, can be solved by constraining the elements of Ψ to be non negative. As the maximum likelihood estimates are scale equivariant the estimates referred to standardized data can be obtained from the ones obtained on the raw data by a simple change in the scale. The same holds for the opposite transformation. Fitting the common factor model by maximum likelihood also allows to select, through a suitable likelihood ratio test, the number of common factors to be included in the model. The null hypothesis is H 0 : Σ = ΛΛ T + Ψ with Λ a p m matrix against the alternative H 1 : Σ is unstructured. The test statistic, after applying Bartlett s correction in order to improve its convergence to a χ 2 distribution, is: W = {n 2p m 3 }{ln ˆΛ ˆΛ T + ˆΨ ln S } where ˆΛ and ˆΨ are the maximum likelihood estimates of the corresponding model parameters. Under H 0, W is distributed according to a χ 2 with 1 2 [(p m)2 (p + m)] degrees of freedom. Examples on the use of FA and on the interpretation of the results will be provided in the notes for the lab session. 4 Rotation of factors In order to improve interpretability of the factor loadings we can relay on the invariance to orthogonal rotation property of the factor model. In
10 Thurston gave a definition of how an interpretable (simple) factor structure should be. The variables should be divisible into groups such that the loadings within each group are high on a single factor, perhaps moderate to low on a few factors and negligible on the remaining factors. One way to obtain a factor loading matrix satisfying such a condition is given by the so-called Varimax rotation. It looks for an orthogonal rotation of the factor loading matrix, such that the following criterion is maximized V = m k=1 { p i=1 β4 ik p ( p i=1 β2 ik p ) 2 } where λ ik β ik = ( m = λ ik k=1 λ2 ik )1/2 h i It should be noted that V is the sum of the variances of the squared normalized (within each row) factor scores for each factor. Maximizing it causes the large coefficients to become larger and the small coefficients to approach 0. Other rotation methods are implemented in most statistical softwares. Quartimax rotation maximizes the overall variance of the factor loading matrix, thus usually producing, as the first factor, a general factor with positive and almost equal loadings on all the variables. Rotations leading to correlated common factors are also possible. 5 Estimation of factor scores After the factor loadings and the unique variances have been estimated, we might be interested in estimating, for each statistical unit whose observed vector is x j, the corresponding vector of factor scores f j. If for instance the first factor is intelligence, this could also allow us to rank the individuals according to their scores on this factor, from the most to the least intelligent ones. Two methods exist in popular use for factor score estimation. The method proposed by Thompson defines the factor scores as linear combinations of the observed variables chosen so as to minimize the squared expected prediction error. For the k-th factor f k, the corresponding estimate is given by f k = at k x = xt a k where a k is a p 1 vector. According to Thompson s approach a k should be chosen so that E(f k f k) 2 = E(x T a k 10
11 f k ) 2 is minimized. After differentiating with respect to a k and setting the derivatives equal to 0 we obtain E[2x(x T a k f k )] = 2[E(xx T )a k E(xf k )] = 2(Σa k Λ k ) = 0 where Λ k is the k-th column of Λ. Hence, a k = Σ 1 Λ k and f k = ΛT k Σ 1 x. It will then be f = Λ T Σ 1 x. After some algebra a different expression for f can be obtained. It is f = (I + Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x. The two formulations give the same result, but some text books present the second one only. Thompson s estimator is biased. In fact E(f f) = E(Λ T Σ 1 x f) = Λ T Σ 1 E(x f) = Λ T Σ 1 Λf f It has however been obtained by minimizing E(fk f k) 2 and so it is the estimator having the least mean squared prediction error. It is worth mentioning that Thompson s estimator is also known in the literature as regression estimator. If we assume that the observed variables x, the common factors f and the unique factors u are normally distributed and consider the vector y T = (f T, x T ) formed by compounding the common factors and the observed variables, under the factor model assumptions, we obtain that y is also distributed according to a 0 mean multivariate normal distribution with covariance matrix given by [ ] I Λ T Λ ΛΛ T + Ψ From the properties of the multivariate normal distribution, the expected value of f given x is E(f x) = Λ T (ΛΛ T + Ψ) 1 x = Λ T Σ 1 x that coincides with the previously obtained f. An alternative estimator is due to Bartlett. After the factor loadings and the unique variances have been estimated, the factor model can be seen as a linear multivariate regression model where f is the unknown vector parameter and the residuals (i.e. the unique factors) are uncorrelated but heteroscedastic. Estimation can be addressed by wighted least squares. 11
12 We want to find an estimate of f, say ˆf, such that u T Ψ 1 u = (x Λf) T Ψ 1 (x Λf) is minimum. After differentiating with respect to f and setting the derivatives equal to 0 we obtain and hence 2Λ T Ψ 1 (x Λf) = 2(Λ T Ψ 1 Λf Λ T Ψ 1 x) = 0 ˆf = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x Bartlett s estimator ˆf is unbiased. In fact E(ˆf f) = E{(Λ T Ψ 1 Λ) 1 Λ T Ψ 1 x f} = = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 E(x f) = = (Λ T Ψ 1 Λ) 1 Λ T Ψ 1 Λf = f Of course it has larger mean squared prediction error than Thompson s estimator. The choice between the two is dependent on the research goals. 6 Factor analysis and PCA It is worth concluding this chapter by stressing the connections and the differences between Factor Analysis and PCA. Both methods have the aim of reducing the dimensionality of a vector of random variables. But while Factor Analysis assumes a model (that may fit the data or not), PCA is just a data transformation and for this reason it always exists. Furthermore while Factor Analysis aims at explaining (covariances) or correlations, PCA concentrates on variances. Despite these conceptual differences, there have been attempts, mainly in the past, to use PCA in order to estimate the factor model. In the following we will show that indeed PCA may be inadequate when the goal of the research is fitting a factor model. Let x be the usual p dimensional random vector and y the p dimensional vector of the corresponding principal components y = A T x with A the 12
13 orthonormal matrix whose columns are the eigenvectors of the covariance matrix of the x variables. Because of the properties of A it will also be x = Ay. A can be decomposed into two sub matrices A m containing the eigenvectors corresponding to the first m eigenvalues and A p m containing the remaining ones A = (A m A p m ). A similar partition is considered for the vector y, y m y = hence y p m x = (A m A p m ) = A m L 1/2 m L 1/2 m y m y p m = A m y m + A p m y p m y m + A p m y p m where L m is the diagonal matrix of the first m eigenvalues. Setting A m L 1/2 m = Λ, L 1/2 m y m = f and A p m y p m = η equation (4) can be rewritten as x = Λf + η. It can be easily checked that the new common factors f have the properties required by the linear factor model E(ff T ) = E(L 1/2 m y m y T ml 1/2 m ) = L 1/2 m E(y m y T m)l 1/2 m = L 1/2 m L m L 1/2 m = I because the covariance matrix of the first m PCS is L m and E(fη T ) = E(L 1/2 m y m y T p ma T p m) = L 1/2 m E(y m y T p m)a T p m = 0 because the first m and the last p m PCs are uncorrelated. The new unique factors η, however, are not uncorrelated and this contradicts the linear factor model assumption according to which the common factors completely explain the observed covariances: (4) E(ηη T ) = E(A p m y p m y T p ma T p m) = A p m E(y p m y T p m)a T p m = A p m L p m A T p m L p m is the covariance matrix of the last p m PCs and therefore it is diagonal; but, in general, its diagonal elements are different and therefore A p m L p m A T p m is not diagonal. 13
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
Common factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
Factor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
Multivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
Introduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
How to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
Multivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
Factor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models
Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of
1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
Overview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Exploratory Factor Analysis
Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than
Factor Rotations in Factor Analyses.
Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are
Chapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
FACTOR ANALYSIS NASC
FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively
Orthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
T-test & factor analysis
Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003
Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
Quadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
Partial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
CS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually
Chapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Exploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
Principal Component Analysis
Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
Sections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
Multivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
Similar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business
Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Understanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
Factor Analysis - 2 nd TUTORIAL
Factor Analysis - 2 nd TUTORIAL Subject marks File sub_marks.csv shows correlation coefficients between subject scores for a sample of 220 boys. sub_marks
Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010
Math 550 Notes Chapter 7 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 7 Fall 2010 1 / 34 Outline 1 Self-Adjoint and Normal Operators
Factorial Invariance in Student Ratings of Instruction
Factorial Invariance in Student Ratings of Instruction Isaac I. Bejar Educational Testing Service Kenneth O. Doyle University of Minnesota The factorial invariance of student ratings of instruction across
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
LINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
Multivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
Applications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student
7 Time series analysis
7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
University of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
Orthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Statistics for Business Decision Making
Statistics for Business Decision Making Faculty of Economics University of Siena 1 / 62 You should be able to: ˆ Summarize and uncover any patterns in a set of multivariate data using the (FM) ˆ Apply
Recall that two vectors in are perpendicular or orthogonal provided that their dot
Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Principal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
The Gravity Model: Derivation and Calibration
The Gravity Model: Derivation and Calibration Philip A. Viton October 28, 2014 Philip A. Viton CRP/CE 5700 () Gravity Model October 28, 2014 1 / 66 Introduction We turn now to the Gravity Model of trip
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
