1 Lecture 7: Factor Analysis Laura McAvinue School of Psychology Trinity College Dublin
2 The Relationship between Variables Previous lectures Correlation Measure of strength of association between two variables Simple linear regression Describes the relationship between two variables by expressing one variable as a function of the other, enabling us to predict one variable on the basis of the other
3 The Relationship between Variables Multiple Regression Describes the relationship between several variables, expressing one variable as a function of several others, enabling us to predict this variable on the basis of the combination of the other variables Factor Analysis Also a tool used to investigate the relationship between several variables Investigates whether the pattern of correlations between a number of variables can be explained by any underlying dimensions, known as factors
4 Uses of Factor Analysis Test / questionnaire construction o For example, you wish to design an anxiety questionnaire o Create 50 items, which you think measure anxiety o Give your questionnaire to a large sample of people o Calculate correlations between the 50 items & run a factor analysis on the correlation matrix o If all 50 items are indeed measuring anxiety All correlations will be high One underlying factor, anxiety Verification of test / questionnaire structure o Hospital Anxiety & Depression Scale o Expect two factors, anxiety & depression
5 Uses of Factor Analysis Examining of the structure of a psychological construct What is attention? A single ability? Several different abilities? Some neuropsychological evidence for existence of different neural pathways for selective & sustained attention Administer tests measuring both aspects to large sample & run factor analysis One underlying factor? Two?
6 An example Visual Imagery Ability Two kinds of measure Self-report questionnaires v Objective tests Is self-reported imagery related to imagery measured by objective tests? Do tests and questionnaires measure the same thing?
7 How does it work? Correlation Matrix Analyses the pattern of correlations between variables in the correlation matrix Which variables tend to correlate highly together? If variables are highly correlated, likely that they represent the same underlying dimension Factor analysis pinpoints the clusters of high correlations between variables and for each cluster, it will assign a factor
8 Correlation Matrix Q1 Q2 Q3 Q4 Q5 Q6 Q1 1 Q Q Q Q Q Q1-3 correlate strongly with each other and hardly at all with 4-6 Q4-6 correlate strongly with each other and hardly at all with 1-3 Two factors!
9 Factor Analysis Two main things you want to know How many factors underlie the correlations between the variables? What do these factors represent? Which variables belong to which factors?
10 Steps of Factor Analysis 1. Suitability of the Dataset 2. Choosing the method of extraction 3. Choosing the number of factors to extract 4. Interpreting the factor solution
11 1. Suitability of Dataset Selection of Variables Sample Characteristics Statistical Considerations
12 Selection of Variables Are the variables meaningful? Factor analysis can be run on any dataset Garbage in, garbage out (Cooper, 2002) Psychometrics The field of measurement of psychological constructs Good measurement is crucial in Psychology Indicator approach Measurement is often indirect Can t measure depression directly, infer on the basis of an indicator, such as questionnaire Based on some theoretical / conceptual framework, what are these variables measuring?
13 Selection of Variables, Example Variables selected were measures of key aspects of imagery ability, according to theory Questionnaires (Richardson, 1994) Vividness Control Preference Objective Tests (Kosslyn, 1999) Generation Inspection Maintenance Transformation Visual STM
14 Sample Characteristics Size At least 100 participants Participant : Variable Ratio Estimates vary Minimum of 5 : 1, ideal of 10 : 1 Characteristics Representative of the population of interest? Contains different subgroups?
15 Sample Characteristics, Example Size 101 participants Participant : Variable Ratio 101 : : 1 Characteristics Interested in imagery ability of general adult population so took a mixed sample of males and females, varying widely in age, educational and employment backgrounds
16 Statistical Considerations Assumptions of factor analysis regarding data Continuous Normally distributed Linear relationships These properties affect the correlations between variables Independence of variables Variables should not be calculated from each other e.g. Item 4 = Item
17 Statistical Considerations Are there enough significant correlations (>.3) between the variables to merit factor analysis? Bartlett Test of Sphericity Tests H o that all correlations between variables = 0 If p <.05, reject H o and conclude there are significant correlations between variables so factor analysis is possible
18 Statistical Considerations Are there enough significant correlations (>.3) between the variables to merit factor analysis? Kaiser-Meyer-Olkin Measure of Sampling Adequacy Quantifies the degree of inter-correlations among variables Value from 0 1, 1 meaning that each variable is perfectly predicted by the others Closer to 1 the better If KMO >.6, conclude there is a sufficient number of correlations in the matrix to merit factor analysis
19 Statistical Considerations, Example All variables Continuous Normally Distributed Linear relationships Independent Enough correlations? Bartlett Test of Sphericity (_ 2 = ; df = 36; p <.001) KMO =.734
20 2. Choosing the method of extraction Two methods Factor Analysis Principal Components Analysis Differ in how they analyse the variance in the correlation matrix
21 Variable Specific Variance Error Variance Common Variance Variance unique to the variable itself Variance due to measurement error or some random, unknown source Variance that a variable shares with other variables in a matrix When searching for the factors underlying the relationships between a set of variables, we are interested in detecting and explaining the common variance
22 Principal Components Analysis Ignores the distinction between the different sources of variance Analyses total variance in the correlation matrix, assuming the components derived can explain all variance Result: Any component extracted will include a certain amount of error & specific variance V Factor Analysis Separates specific & error variance from common variance Attempts to estimate common variance and identify the factors underlying this Which to choose? Different opinions Theoretically, factor analysis is more sophisticated but statistical calculations are more complicated, often leading to impossible results Often, both techniques yield similar solutions
23 2. Choosing the method of extraction, Example Tried both Chose Principal Components Analysis as Factor Analysis proved impossible (estimated communalities > 1)
24 3. Choosing the number of factors to extract Statistical Modelling You can create many solutions using different numbers of factors An important decision Aim is to determine the smallest number of factors that adequately explain the variance in the matrix Too few factors Second-order factors Too many factors Factors that explain little variance & may be meaningless
25 Criteria for determining Extraction Theory / past experience Latent Root Criterion Scree Test Percentage of Variance Explained by the factors
26 Latent Root Criterion (Kaiser-Guttman) Eigenvalues Expression of the amount of variance in the matrix that is explained by the factor Factors with eigenvalues > 1 are extracted Limitations Sensitive to the number of variables in the matrix More variables eigenvalues inflated overestimation of number of underlying factors
27 Scree Test (Cattell, 1966) Scree Plot Based on the relative values of the eigenvalues Plot the eigenvalues of the factors Cut-off point The last component before the slope of the line becomes flat (before the scree)
28 3.5 Scree Plot Elbow in the graph Component Number Take the components above the elbow
29 Percentage of Variance Percentage of variance explained by the factors Convention Components should explain at least 60% of the variance in the matrix (Hair et al., 1995)
30 3. Choosing the number of factors to extract, Example Scree Plot Three components with eigenvalues > 1 Eigenvalue Explained 67.26% of the variance Component Number
31 4. Interpreting the Factor Solution Factor Matrix Shows the loadings of each of the variables on the factors that you extracted Loadings are the correlations between the variables and the factors Loadings allow you to interpret the factors Sign indicates whether the variable has a positive or negative correlation with the factor Size of loading indicates whether a variable makes a significant contribution to a factor.3
32 Variables Component 1 Component 2 Component 3 Vividness Qu Control Qu Preference Qu Generate Test Inspect Test Maintain Transform (P&P) Test Transform (Comp) Test Visual STM Test Component 1 Visual imagery tests Component 2 Visual imagery questionnaires Component 3?
33 Factor Matrix Interpret the factors Communality of the variables Percentage of variance in each variable that can be explained by the factors Eigenvalues of the factors Helps us work out the percentage of variance in the correlation matrix that the factor explains
34 Variables Component 1 Component 2 Component 3 Communality Vividness Qu % Control Qu % Preference Qu % Generate Test % Inspect Test % Maintain % Transform (P&P) Test % Transform (Comp) Test % Visual STM Test % Eigenvalues / % Variance 37.3% 18.6% 11.3% / Communality of Variable 1 (Vividness Qu) = (-.198) 2 + (-.805) 2 + (.061) 2 =. 69 or 69% Eigenvalue of Comp 1 = ( [-.198] 2 + [.173] 2 + [.353] 2 + [-.444] 2 + [-.773] 2 +[.734] 2 + [.759] 2 + [-.792] 2 + [.792] 2 ) = / 9 = 37.3%
35 Factor Matrix Unrotated Solution Initial solution Can be difficult to interpret Factor axes are arbitrarily aligned with the variables Rotated Solution Easier to interpret Simple structure Maximises the number of high and low loadings on each factor
36 Factor Analysis through Geometry It is possible to represent correlation matrices geometrically Variables Represented by straight lines of equal length All start from the same point High correlation between variables, lines positioned close together Low correlation between variables, lines positioned further apart Correlation = Cosine of the angle between the lines
37 V1 V2 V1 & V3 90º angle Cosine = 0 No relationship 30º 60º V3 V1 & V2 30º angle Cosine =.867 r =.867 The smaller the angle, the bigger the cosine and the bigger the correlation V2 & V3 60º angle Cosine =.5 R =.5
38 F1 V1 V2 V3 V4 V5 V6 F2 Factor Analysis Fits a factor to each cluster of variables Passes a factor line through the groups of variables Factor Loading Cosine of the angle between each factor and the variable
39 Two Methods of fitting Factors F1 F1 V1 V2 V3 V1 V2 V3 V4 V5 V6 V4 V5 V6 F2 F2 Orthogonal Solution Factors are at right angles Uncorrelated Oblique Solution Factors are not at right angles Correlated
40 Two Step Process F1 F1 V1 V2 V3 V1 V2 V3 V4 V4 F2 V5 V6 F2 V5 V6 Factors are fit arbitrarily Factors are rotated to fit the clusters of variables better
41 For example Unrotated Solution Solution following Orthogonal Rotation Variables C1 C2 C3 Variables C1 C2 C3 Vividness Qu Vividness Qu Control Qu Control Qu Preference Qu Preference Qu Generate Test Generate Test Inspect Test Inspect Test Maintain Test Maintain Test Transform (P&P) Test Transform (P&P) Test Transform (Comp) Test Transform (Comp) Test Visual STM Test VisualSTM Test
42 Factor Rotation Changes the position of the factors so that the solution is easier to interpret Achieves simple structure Factor matrix where variables have either high or low loadings on factors rather than lots of moderate loadings
43 Evaluating your Factor Solution Is the solution interpretable? Should you re-run and extract a bigger or smaller number of factors? What percentage of variance is explained by the factors? >60%? Are all variables represented by the factors? If the communality of one variable is very low, suggests it is not related to the other variables, should re-run and exclude
44 For example First Solution Second Solution Variables C1 C2 C3 Variables Component 1 Component 2 Vividness Qu Vividness Qu Control Qu Preference Qu Control Qu Preference Qu Generate Test Generate Test Inspect Test Inspect Test Maintain Test Maintain Test Transform (P&P) Test Transform (Comp) Test VisualSTM Test Transform (P&P) Test Transform (Comp) Test Visual STM Test Component 3 =? C1 = Efficiency of objective visual imagery C2 = Self-reported imagery efficacy
45 References Cooper, C. (1998). Individual differences. London: Arnold. Kline, P. (1994). An easy guide to factor analysis. London: Routledge.