# Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Save this PDF as:

Size: px
Start display at page:

Download "Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016"

## Transcription

1 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

2 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings and Communalities Properties of the Model Rotation Factor Scores Tutorial, guidelines and rules of thumb 2

3 Factor Analysis The aim of Factor Analysis is to find hidden (latent) variables which explain the correlation coefficients of the variables observed. Examples: A firm s image The sales aptitude of salespersons Performance of the firm Resistance to high-tech innovations 3

4 When to use FA? Exploratory / descriptive analysis: Learning the internal structure of the dataset: What are the main dimensions of the data? Helps to visualize multivariate data in lower-dimensional pictures Data reduction technique If you have a high-dimensional dataset, then is there a way to capture the essential information using a smaller number of variables? Can be also considered as a preprocessing step for techniques which are sensitive to multi-collinearity (e.g. regression) 4

5 Two Approaches Initially Factor structure to explain the correlations among variables is searched without any a priori theory What are the underlying processes that could have produced correlations among the variables? Note: there are no readily available criteria against which to test the solution Confirmatory Factor Analysis Factor structure is assumed to be known or hypothesized a priori Are the correlations among variables consistent with a hypothesized factor structure? Often performed through structural equations modeling 5

6 Part I: Basic concepts and examples 6

7 History and Example Originally developed by Spearman (1904) to explain student performance in various courses. Suppose the students test scores (M: Mathematics, P: Physics, C: Chemistry, E: English, H: History and F: French.) depends on The student s general intelligence I and The student s aptitude for a given course 7

8 Illustration of Factor Analysis Observed Variable Intelligence Latent Variable (Common Factor) Unique Factor Math. Physics Chem. English History French A M A P A C A E A H A F 8

9 Example: 1-factor model For example, as follows: M =.80I + A m P =.70I + A p C =.90I + A c E =.60I + A e H =.50I + A h F =.65I + A f, A m, A p, A c, A e, A h, and A f are standing for special aptitude (Specific Factors). The coefficients:.8,.7,.9,.6,.5, and.65 are called Factor Loadings. The variables M, P, C, E, H, and F are indicators or measures of I (Common Factor). 9

10 Assumptions: Means of the variables (indicators), the common factor I, and the unique factors are zero Variances of the variables (indicators), and the common factor I, are one Correlations between the common factor I and the unique factors are zero, and Correlations among the unique factors are zero 10

11 From Assumptions Variances of Variables (Ex. M) Var(M) = Var(0.8I + A m ) = Var(I) + Var(A m ) = Var(A m ) Covariance of Variables (Ex. M and H) Cov(M, H) = Cov( (0.8I + A m ), (0.5I + A h ) ) = 0.8*0.5Cov(I, I) + 0.8Cov(I, A h ) + 0.5Cov(A m, I) + Cov(A m, A h ) = 0.8*0.5 = 0.40 Generally, the variables are standardized in factor analysis Var(M) & Cov = Corr 11

12 Variance Decomposition The total variance of any indicator is decomposed into two components Variance in common with I (Communality of the indicator) Variance in common with the specific factor Example: Var(M) = Var(A m ) 12

13 Variance decomposition (cont.) Source: Hair et al. (2010) Diagonal Value Unity (1) Variance Total Variance Communality Common Specific and Error Variance extracted Variance not used Hair et al. (2010): Multivariate Data Analysis, Pearson Education 13

15 Example: 2-Factor Model M =.80Q +.20V + A m P =.70Q +.30V + A p C =.60Q +.30V + A c E =.20Q +.80V + A e H =.15Q +.82V + A h F =.25Q +.85V + A f 15

16 Covariances for 2-Factor Model Variances of Variables (Ex. M) Var(M) = Var(0.8Q + 0.2V + A m ) = Var(Q) Var(V) + Var(A m ) = Var(A m ) Covariance of Variables (Ex. M and H) Cov(M, H) = = Cov((0.8Q + 0.2V + A m ), (0.15Q V + A m )) = 0.8*0.15Cov(Q, Q) + 0.2*0.82Cov(V, V) = 0.8* *0.82 =

17 Part II: Factor Model 17

18 Objectives of Factor Analysis To identify the smallest number of common factors that best explain the correlations among variables To estimate loadings and communalities To identify, via rotation, the most plausible factor structure To estimate factor scores, when desired 18

19 Some Theory: Factor Model Consider a p-dimensional random vector: x (µ, Σ). An m-factor model: x = Λf + ε + µ, where Λ = Λ (p,m) is a matrix of factor loadings, and f = f (m,1) and ε = ε (p,1) are random vectors. The elements of vector f are common factors and the elements of ε are unique factors. 19

20 Factor Equation Factor Equation: Σ = ΛΛ' + Ψ, where Σ is the covariance matrix of the variables X Λ is the loading matrix, Ψ is a diagonal matrix containing the unique variances 20

21 Factor Equation (cont.) The communalities: Σ Ψ The covariances (correlations) between the variables and the factors is given by: E((x - µ) f T ) = E((Λf + ε )f T ) = ΛE( f f T ) + E(ε f T ) = Λ 21

22 Interpretation of the Common Factors Loadings the covariances between the loadings and the factors Eigenvalues the variance explained by each factor 22

23 Factor Indeterminacy Factor indeterminacy due to rotation Factor indeterminacy due to the estimation of the communality problem 23

24 Factor Indeterminacy (cont.) Factor indeterminacy due to rotation Consider M =.667Q.484V + A m P =.680Q.343V + A p C =.615Q.267V + A c E =.741Q +.361V + A e These alternative loadings provide the same total communalities and uniquenesses as the previously presented solution H =.725Q +.412V + A h F =.812Q +.355V + A f Even correlation matrices are identical (Note that two common factors are assumed to be uncorrelated.) 24

25 Factor Indeterminacy (cont.) Factor indeterminacy due to the estimation of communality problem To estimate Loadings, the communalities are needed, and To estimate communalities, the loadings are needed (!) 25

26 Factor Analysis Techniques (Tabachnik and Fidell: Using Multivariate Statistics) Principal Component Factoring (PCF) The initial estimates of the communalities for all variables are equal to one (= Principal Component Analysis) Principal Axis Factoring (PAF) Principal components analyze total variance, whereas FA analyzes covariance (communality) An attempt is made to estimate the communalities: Explain each variable with the other variables and use the multiple determination as an initial estimate for communality Find the communalities through an iterative process 26

27 Image Factor Extraction Uses correlation matrix of predicted variables, where each variable is predicted using others via multiple regression A compromise between PCA and principal axis factoring Like PCA provides a mathematically unique solution because there are fixed values in the positive diagonal Like PAF, the values in the diagonal are communalities with unique error variability excluded Loadings represent covariances between variables and factors rather than correlations Maximum Likelihood Factor Extraction Population estimates for factor loadings are calculated which have the greatest probability of yielding a sample with the observed correlation matrix 27

28 Unweighted Least Squares Factoring Minimizes squared differences between the observed and reproduced correlation matrices Only off-diagonal differences considered, communalities are derived from solution rather than estimated as a part Special case of principal factors, where communalities are estimated after the solution Generalized (Weighted) Least Squares Factoring Variables that have substantial shared variance with other variables get higher weights than variables with large unique variance Alpha Factoring Interest is in discovering which common factors are found consistently when repeated samples of variables are taken from a population of variables 28

29 Summary of extraction procedures Technique Goal of analysis Special features Principal components Maximize variance extracted by orthogonal components Principal factors Image factoring Maximize variance extracted by orthogonal factors Provides an empirical factor analysis Source: Tabachnik and Fidell: Using Multivariate Statistics Mathematically determined, solution mixes common, unique, and error variance into components Estimates communalities to attempt to eliminate unique and error variance from variables Uses variances based on multiple regression of a variable with other variables as communalities 29

30 Summary of extraction procedures (cont.) Technique Goal of analysis Special features Maximum likelihood factoring Alpha factoring Unweighted least squares Generalized least squares Estimate factor loadings for population that maximize the likelihood of sampling the observed correlation matrix. Maximize the generalizability of orthogonal factors Minimize squared residual correlations Weights variables by shared variance before minimizing squared residual correlations Source: Tabachnik and Fidell: Using Multivariate Statistics Has significance test for factors; useful for confirmatory factor analysis 30

31 Part III: Rotations 31

32 Two Classes of Rotational Approaches Orthogonal = axes are maintained at 90 degrees. Oblique = axes are not maintained at 90 degrees. 32

33 Orthogonal Factor Rotation Source: Hair et al. (2010) Unrotated Factor II +1.0 Rotated Factor II V V Unrotated Factor I -.50 V 5 V 4 V 3 Rotated Factor I -1.0

34 Oblique Factor Rotation Source: Hair et al. (2010) Unrotated Factor II +1.0 Orthogonal Rotation: Factor II V 1 Oblique Rotation: Factor II +.50 V Unrotated Factor I V 5 V 4 V 3 Oblique Rotation: Factor I Orthogonal Rotation: Factor I

35 Orthogonal Rotation Identify an orthogonal transformation matrix C such that: Λ* = ΛC and Σ = Λ*Λ* + Ψ, where C T C = I Remember the connection to factor indeterminacy problem (!) 35

36 Varimax Rotation Find a factor structure in which each variable loads highly on one and only one factor (i.e. to simplify columns of the loading matrix) That is, for any given factor, is the variance of the communalities of the variables within factor j Total variance: 36

37 Varimax Rotation (cont.) Find the orthogonal matrix C such that it maximizes V, which is equivalent to maximizing subject to the constraint that the communality of each variable remains the same. 37

38 Quartimax Rotation Purpose: To simplify rows of the loading matrix, i.e. to obtain a pattern of loadings such that: All the variables have a fairly high loading on one factor Each variable should have a high loading on one other factor and near zero loadings on the remaining factors The quartimax rotation will be most appropriate in the presence of the general factor 38

39 Quartimax Rotation (cont.) For any variable i, the variance of communalities (i.e. square of the loadings) is given by Then the total variance of all the variables is 39

40 Quartimax Rotation (cont.) Quartimax rotation is obtained by finding the orthogonal matrix C such that Q max. This problem can be reduced into the following form: Subject to the condition that the communality of each variable remains the same Varimax is often preferred over quartimax, since it leads to cleaner separation of factors and tends to be more invariant when a different subset of variables is analyzed 40

41 Oblique Rotations The factors are allowed to be correlated: oblique rotations offer a continuous range of correlations between factors The degree of correlation between factors is determined by the delta-variable δ: δ = 0 : solutions fairly highly correlated δ < 0 : solutions are increasingly orthogonal at about -4 solution is orthogonal δ ~ 1 : leads to very highly correlated solutions Note: Although delta affects size of correlation, maximum correlation at a given value depends on the dataset 41

42 Commonly Used Oblique Rotations Promax: Orthogonal factors rotated to oblique position (orthogonal loadings are raised to powers (usually 2, 4 or 6) to drive small and moderate loadings to zero while larger loadings are reduced Direct Oblimin: Simplifies factors by minimizing sum of cross-products of squared loadings in pattern matrix Values of δ > 0 produce highly correlated factors à careful consideration needed when deciding the number of factors (!) Helps to cope with situations encountered in practice? Note: factor loadings obtained after oblique rotations no longer represent correlations between factors and observed variables 42

43 Terminology in Oblique Rotations Factor correlation matrix = Correlations between factors (standardized factor scores) after rotation Pattern matrix = Regression-like weights representing the unique contribution of each factor to the variance in the variable (comparable to loadings matrix when having orthogonal factors) Structure matrix = Correlations between variables and correlated factors (given by the product of the pattern matrix and the factor correlation matrix) 43

44 Methods for Obtaining Factor Scores Thomson s (1951) regression estimates Assumes the factor scores to be random The assumption is appropriate when we are interested in the general structure (different samples consisting of different individuals) Bartlett s estimates Assumes the factor scores to be deterministic Assumes normality and that loadings and uniquenesses are known Anderson-Rubin estimates No clear favorite method, each has its advantages and disadvantages 44

45 Factor Scores via Multiple Regression Estimate: E{f ij } = β 1j x i1 + + β pj x ip In matrix form: E{F} = XB and for standardized variables: F = ZB Hence (n-1) -1 Z F = (n-1) -1 Z ZB Λ = RB B = R -1 Λ Since (n-1) -1 Z F = Λ and (n-1) -1 Z Z = R 45

46 FA in Practice Analysis of the HBAT s consumer survey results Form groups of 1 to 3 people Tutorial 46

47 Conceptual Issues Basic assumption is that an underlying structure exists in the set of variables Presence of correlated variables and detected factors do not guarantee relevance, even if statistical requirements are met Ensuring conceptual validity remains the responsibility of the researcher Remember: Do not mix dependent and independent variables in a single factor analysis, if the objective is to study dependence relationships using derived factors 47

48 Conceptual Issues (cont.) Ensure that the sample is homogeneous with respect to the underlying factor structure If the sample has multiple internal groups with unique characteristics, it may be inappropriate to apply factor analysis on the pooled data If different groups are expected, separate factor analysis should be performed for each group Compare group specific results to the combined sample 48

49 Sample Size and Missing Data Correlations estimated from small samples tend to be less reliable Minimum sample size should be 50 observations Sample must have more observations than variables Strive to maximize the number of obs / variable (desired ratio is 5:1) Recommendations by Comrey and Lee (1992): Sample size 100 = poor, 200 = fair, 300 = good, 500 = very good Missing values: If cases are missing values in a nonrandom pattern or if sample size is too small, estimation is needed Beware of using estimation procedures (e.g. regression) that are likely to overfit data and cause correlations to be too high 49

50 Factorability of Correlation Matrix A factorable correlation matrix should include several sizeable correlations (e.g. Bartlett s test of sphericity) If no correlation exceeds.30, use of FA is questionable Warning: high bivariate correlations do not necessarily ensure existence of factors à Examine partial correlations or anti-image correlations (negatives of partial correlations) If factors are present, high bivariate correlations become very low partial correlations 50

51 Partial Correlation Partial correlation between variables X and Y given a set of n controlling variables Z = {Z 1,,Z n } is the correlation coefficient ρ X,Y Z, where relatedness due to controlling variables is taken into account In practice: partial correlation is computed as bivariate correlation between residuals from linear regressions of X ~ Z and Y ~ Z X ρ 2 X,Y Z = a / (a+d) Y d a b c Z 51

52 Geometrical Interpretation of Partial Correlation Residuals from regressions Source: Wikipedia 52

53 Other Practical Issues Normality When FA is used in exploratory manner to summarize relationships, assumptions on distributions are not in force Normality enhances solution (but is not necessary) Linearity Multivariate normality implies linear relationships between pairs of variables Analysis is degraded when linearity fails (note: correlation measures linear relationship) Absence of multicollinearity and singularity Some degree of multicollinearity is desirable but extreme multicollinearity or singularity is an issue (check for eigenvalues close to zero or zero determinant of correlation matrix) 53

54 Outliers Among Cases and Variables Screening for outliers among cases Factor solution may be sensitive to outlying cases Screening for outliers among variables A variable with low squared multiple correlation with all other variables and low correlation with all important factors is an outlier among the variables Outlying variables are often ignored in current FA or the researcher may consider adding more related variables in a further study Note: Factors defined by just one or two variables are not stable (or real ). If the variance accounted by such factor is high enough, it may be interpreted with caution or ignored. 54

55 Choosing and Evaluating a Solution Number and nature of factors How many reliable and interpretable factors are there in the data set? What is the meaning of the factors? How are they interpreted? Importance of solutions / factors How much variance in a dataset is accounted for by the factors? Which factors account for the most variance? Testing theory in FA How well does the obtained solution fit an expected factor solution? Estimating scores on factors How do the subjects score on the factors? 55

56 Appendix: 56

57 Preliminary Considerations Assume that Population mean: is a vector of p random variables Population covariance: Correlation between i-th and j-th variable: 57

58 Preliminary Considerations Variance of a linear combination of p many variables Generalized Variance: Σ Total Variation: tr Σ 58

59 Preliminary Considerations Illustration: Combining Uncorrelated Variables a 1 a 1 a = a a a 2 2 = 1, because r 12 = 0 2 s 2 2 = 1 a 2 r 12 = 0, a 12 +a 22 =1 a 1 s 1 2 = 1 59

60 Preliminary Considerations Illustration: Combining Correlated Variables & a % 1 a 2 \$ " # 1 r 12! r \$ %& 21 1 " "#! a \$ %%& = a a a 1 a 2 r 12 > 1, when r 12 > 0 2 # "! a 1 s 2 2 = 1 a 2 r 12 > 0, a 12 +a 22 =1 a 1 s 1 2 = 1 60

61 Some Theory: Factor Model Consider a p-dimensional random vector: x (µ, Σ). An m-factor model: x = Λf + ε + µ, where Λ = Λ (p,m) is a matrix of factor loadings, and f = f (m, 1) and ε = ε (p,1) are random vectors. The elements of vector f are common factors and the elements of ε are unique factors. 61

62 Assumptions E(f) = 0 & Cov(f) = I E(ε) = 0 & Cov(ε i ε j ) = 0, i j Cov(f, ε) = 0 Cov(ε) = ψ = diag(ψ 11, ψ 22,, ψ pp ) Thus Σ = E((x - µ) (x - µ) T ) = E((Λf + ε)(λf + ε) T ) = E(Λf(Λf) T ) + E(ε ε T ) + E(Λfε T ) + E(ε(Λf) T ) = ΛE(f f T )Λ T + E(ε ε T ) + ΛE(f ε T ) + E(ε f T )Λ T = ΛΛ T + ψ 62

63 Label Name Size Description Λ Factor loading matrix (or pattern matrix in oblique methods) p x m Matrix of regression-like weights used to estimate the unique contribution of each factor to the variance in a variable x Vector of variables p x 1 Observed random variables Σ Covariance or correlation matrix p x p Covariances or correlations between variables µ Expected values of variables p x 1 Expected values of observed random variables f Common factors m x 1 Vector of common factors ε Unique factors p x 1 Vector of variable specific unique factors Ψ Covariance of unique factors p x p Covariance matrix for unique factors C Rotation matrix m x m Transformation matrix to produce rotated loading matrix 63

64 Factor Equation Factor Equation: Σ = ΛΛ' + Ψ, where Σ is the covariance matrix of the variables X Λ is the loading matrix, Ψ is a diagonal matrix containing the unique variances 64

65 Factor Equation (cont.) The communalities: Σ Ψ The covariances (correlations) between the variables and the factors is given by: E((x - µ) f T ) = E((Λf + ε )f T ) = ΛE( f f T ) + E(ε f T ) = Λ 65

66 Solving the Factor Equation How to solve? Σ Ψ = ΛΛ T Use Spectral Decomposition Theorem: Any symmetric matrix A (p,p) can be written as A = ΓΘΓ T, where Θ is a diagonal matrix of eigenvalues of A, and Γ is an orthogonal matrix whose columns are standardized eigenvectors. 66

67 Therefore Provided Ψ is known, we may have: Σ - Ψ = ΓΘΓ T = (ΓΘ 1/2 )( Θ 1/2 Γ T ) Assume the first k eigenvalues θ i > 0, i = 1, 2,, k, then we may write λ i = (θ i )1/2 γ i Thus Λ = Γ 1 Θ 1 1/2, where Γ 1 is k k 67

68 Thank you! 68

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables

### Factor Analysis. Factor Analysis

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

### A Brief Introduction to SPSS Factor Analysis

A Brief Introduction to SPSS Factor Analysis SPSS has a procedure that conducts exploratory factor analysis. Before launching into a step by step example of how to use this procedure, it is recommended

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### Exploratory Factor Analysis

Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than

### FACTOR ANALYSIS EXPLORATORY APPROACHES. Kristofer Årestedt

FACTOR ANALYSIS EXPLORATORY APPROACHES Kristofer Årestedt 2013-04-28 UNIDIMENSIONALITY Unidimensionality imply that a set of items forming an instrument measure one thing in common Unidimensionality is

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

### FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

### Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of

### Topic 10: Factor Analysis

Topic 10: Factor Analysis Introduction Factor analysis is a statistical method used to describe variability among observed variables in terms of a potentially lower number of unobserved variables called

### Statistics for Business Decision Making

Statistics for Business Decision Making Faculty of Economics University of Siena 1 / 62 You should be able to: ˆ Summarize and uncover any patterns in a set of multivariate data using the (FM) ˆ Apply

### 4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables--factors are linear constructions of the set of variables; the critical source

### 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) 3. Univariate and multivariate

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius

### SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

### Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?

1 Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances? André Beauducel 1 & Norbert Hilger University of Bonn,

### Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

### T-test & factor analysis

Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Factor Analysis. Sample StatFolio: factor analysis.sgp

STATGRAPHICS Rev. 1/10/005 Factor Analysis Summary The Factor Analysis procedure is designed to extract m common factors from a set of p quantitative variables X. In many situations, a small number of

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### Exploratory Factor Analysis

Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.

### Chapter 7 Factor Analysis SPSS

Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often

### The president of a Fortune 500 firm wants to measure the firm s image.

4. Factor Analysis A related method to the PCA is the Factor Analysis (FA) with the crucial difference that in FA a statistical model is constructed to explain the interrelations (correlations) between

### [1] Diagonal factorization

8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

### A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis

Tutorials in Quantitative Methods for Psychology 2013, Vol. 9(2), p. 79-94. A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis An Gie Yong and Sean Pearce University of Ottawa

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

### 1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

### Understanding and Using Factor Scores: Considerations for the Applied Researcher

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

### Lecture 7: Factor Analysis. Laura McAvinue School of Psychology Trinity College Dublin

Lecture 7: Factor Analysis Laura McAvinue School of Psychology Trinity College Dublin The Relationship between Variables Previous lectures Correlation Measure of strength of association between two variables

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red) The following DATA procedure is to read input data. This will create a SAS dataset named CORRMATR

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

### STA 4107/5107. Chapter 3

STA 4107/5107 Chapter 3 Factor Analysis 1 Key Terms Please review and learn these terms. 2 What is Factor Analysis? Factor analysis is an interdependence technique (see chapter 1) that primarily uses metric

### Factor Analysis: Statnotes, from North Carolina State University, Public Administration Program. Factor Analysis

Factor Analysis Overview Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of

### A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

### DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Psychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use

: Suggestions on Use Background: Factor analysis requires several arbitrary decisions. The choices you make are the options that you must insert in the following SAS statements: PROC FACTOR METHOD=????

### Multidimensional data and factorial methods

Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

### Exploratory Factor Analysis

Exploratory Factor Analysis ( 探 索 的 因 子 分 析 ) Yasuyo Sawaki Waseda University JLTA2011 Workshop Momoyama Gakuin University October 28, 2011 1 Today s schedule Part 1: EFA basics Introduction to factor

### Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

### Principal Component Analysis Application to images

Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

### Introduction to Principal Component Analysis: Stock Market Values

Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2

8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =

### Factor Rotations in Factor Analyses.

Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are

### Principal Component Analysis

Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

### 2 Robust Principal Component Analysis

Robust Multivariate Methods in Geostatistics Peter Filzmoser 1, Clemens Reimann 2 1 Department of Statistics, Probability Theory, and Actuarial Mathematics, Vienna University of Technology, A-1040 Vienna,

### Factor Analysis - 2 nd TUTORIAL

Factor Analysis - 2 nd TUTORIAL Subject marks File sub_marks.csv shows correlation coefficients between subject scores for a sample of 220 boys. sub_marks

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Canonical Correlation Analysis

Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### PRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE

PRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE Markela Muca Llukan Puka Klodiana Bani Department of Mathematics, Faculty of

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

### 3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Practical Considerations for Using Exploratory Factor Analysis in Educational Research

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### How to report the percentage of explained common variance in exploratory factor analysis

UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

### Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David