Statistics for Business Decision Making


 Rosanna Carr
 3 years ago
 Views:
Transcription
1 Statistics for Business Decision Making Faculty of Economics University of Siena 1 / 62
2 You should be able to: ˆ Summarize and uncover any patterns in a set of multivariate data using the (FM) ˆ Apply factor analysis to business decisionmaking situations ˆ Analyze and interpret the output of a 2 / 62
3 Data reduction ˆ (FA) is a multivariate statistical technique of data reduction ˆ Starting point: a large dataset with many correlated variables X 1,X 2,...,X k. Interdependence among the variables is explored. Due to their correlation, the information content of a given variable may overlap with the information content of any other variable, thus producing a double counting of the same information in the original dataset ˆ Through FA a smaller set of new unobserved variables (the common factors) is identied that can be used to explain the interrelationships among the original variables. 3 / 62
4 How do the factors explain the association among the original variables To say that the factors explain the associations among the original variables means that the original variables are assumed to be conditionally independent given the factors. In other words, any correlation between each pair of measured (manifest) variables arises because of their mutual association with the common factors. 4 / 62
5 Aim of The denition and interpretation of a smaller number (m < k) of new variables F 1,F 2,...,F m (called factors, often to be thought of as latent constructs) that capture the statistical information contained in the original variables ˆ Advantage: reduction in the complexity of the data, greater simplicity in describing the observed phenomenon ˆ Disadvantage: loss in information plus the introduction of an error component Tradeo: how much loss in the original information are we disposed to accept just to achieve a more parsimonious data summary? Usually the stronger the correlations among the original variables the smaller the number of factors needed to adequately summarize the information 5 / 62
6 Exploratory FA vs. Conrmatory FA Exploratory starts from observed data to identify unobservable and underlying factors, unknown to the researcher but expected to exist from theory Conrmatory the researcher wants to test one or more specic underlying structures, specied prior to the analysis. This is frequently the case in psychometric studies 6 / 62
7 Latent Variable s (LVM) may be classied within the framework of Latent Variable s (LVM). LVM are used to represent the complex relations among several manifest variables by simple relations between the variables and an underlying latent structure. is a Latent Variable where both manifest and latent variables are measured on a metrical scale 7 / 62
8 in Many steps are involved: 1 Identify the main attributes used to evaluate a product/service (for a toothpaste these may be the benets provided in preventing plaque and tartar, freshening the breath, keeping the gums healthy, keeping the mouth clean, etc.) 2 Collect data from a random sample of potential customers on their ratings of all the product attributes (for example on a Likert scale ranging from 1 to 5) 3 Run a factor analysis for nding a set of underlying factors that summarize the respondents attitude towards that product/service 4 Use the new smaller set of factors to either construct perceptual maps and other product positioning services or to simplify subsequent analysis of the data (through regression models or clustering methods) 8 / 62
9 Example 1  Attitude and consumer behaviour towards supermarkets Original variables: items that measure consumer's attitudes towards supermarkets Aim: ˆ convenience in reaching the store ˆ product prices ˆ store location ˆ sales promotion ˆ width of aisle in the store ˆ store athmosphere and decoration ˆ store size 1 to summarize the original dataset into a smaller number of dimensions (through FA) 2 to evaluate the eect of the summary dimensions on the choice of the preferred kind of supermarkets (through logit regression). Being the factors uncorrelated, multicollinearity is not a matter of concern 9 / 62
10 Example 2  Buying behaviour towards local products Original variables: a set of attitudinal statements relating to dierent aspects of consumers' buying behaviour towards local products ˆ production methods ˆ appearance of a special label ˆ use of no chemical adds ˆ help of local economy ˆ price, quality and nutrition value ˆ environmental and health protection ˆ external appearance ˆ attractiveness of packing ˆ freshness and taste ˆ prestige and curiosity Aim: 1 to identify a smaller number of underlying factors that aect consumers buying behaviour towards local products (through FA) 2 to use the new factors for grouping consumers with similar patterns into homogeneous clusters based on their buying behaviour (through cluster analysis) 10 / 62
11 Linear Each observed variable X j is linearly related to m common factors F 1,F 2,...,F m and a unique component ε j X 1 = γ 11 F 1 + γ 12 F γ 1m F m + ε 1 X 2 = γ 21 F 1 + γ 22 F γ 2m F m + ε 2... X j = γ j1 F 1 + γ j2 F γ jm F m + ε j... X k = γ k1 F 1 + γ k2 F γ km F m + ε k X j (j = 1,2,...,k) is the original (standardized) variable F h (h = 1,2,...,m) denotes the unobserved common factor γ j1,γ j2,...,γ jm are the factor loadings of X j on the common factors ε j is the residual or unique (as opposed to common) component. It measures the error committed when the original data are summarized by m factors 11 / 62
12 Comments on the variables in the model ˆ The standardization of the original variables is needed when they are not measured in the same units (and also when they are on very dierent scales). If they are not standardized, the variables with the larger variances would have a greater weight in the method of the factor model. ˆ The variables must be quantitative. For qualitative variables, dierent methods of data reduction must be applied (correspondence analysis, multidimensional scaling) 12 / 62
13 assumptions ˆ A1. Linearity of the relationship ˆ A2. E [F h ] = 0; Var [F h ] = 1; Cov(F h,f s ) = 0 h,s = 1,2,...,m; s h ˆ A3. E [ε j ] = 0; Cov(ε j,ε t ) = 0 j,t = 1,2,...,k; t j ˆ A4. Cov(ε j,f h ) = 0 j = 1,2,...,k; h = 1,2,...,m 13 / 62
14 Comments on assumptions A1. Linear models are widely used in statistical data analysis A2. Since the factors are not observable, we might as well think of them as measured in standardized form. Being uncorrelated, each factor has its own information content that does not overlap with the information content of the other factors A3. The unique term can be considered as the error term in a linear regression model since it represents that part of an observed variable not accounted for by the common factors. The homoskedasticity is not required A3 and A4 imply that the correlation between any two observed variables is due solely to the common factors 14 / 62
15 Consequences of assumptions: variances The variances of the observed variables are functions of: ˆ the factor loadings (γ coecients) ˆ the variances of the unique terms. Var(X j ) = 1 = γ 2 j1var(f 1 ) γ 2 jmvar(f m ) + Var(ε 2 j ) = = γ 2 j γ 2 jm + Var(ε 2 j ) = = m γjh 2 h=1 } {{ } communality + Var(ε 2 j ) } {{ } uniqueness (1) 15 / 62
16 Communality and uniqueness The communality of an observed variable is the proportion of its variance that is explained by the common factors. The larger the communality, the more successful the factor model can be in explaining the variable. The uniqueness (or specic variance) is the part of the variance of X j that is not accounted by the common factors but it's due to the unique component. 16 / 62
17 Consequences of assumptions: covariances The covariances between the observed variables are only functions of the factor loadings: Cov(X j,x t ) = γ j1 γ t1 + γ j2 γ t γ jm γ tm = m h=1 γ jh γ th (2) The covariances between observed variables and factors are expressed by the factor loadings: Cov(X j,f h ) = γ jh (3) 17 / 62
18 FM in matrix X : (n k) matrix of k original variables F : (n m) matrix of m factors X = FΓ + E (4) Γ : (k m) rectangular matrix of factor loadings whose generic element is {γ jh } j=1,...,k;h=1,...,m E : (n k) matrix of k unique components 18 / 62
19 X, F and E matrices x 11 x 12 x 1k x 21 x 22 x 2k X = = ( ) X 1 X 2... X k x n1 x n2 x nk E = ( ) ε 1 ε 2... ε k F 11 F 12 F 1m F 21 F 22 F 2m F = = ( ) F 1 F 2... F m F n1 F n2 F nm (5) (6) (7) 19 / 62
20 Γ matrix γ 11 γ 12 γ 1m γ 21 γ 22 γ 2m Γ = γ k1 γ k2 γ km is the matrix of factor loadings γ jh (j = 1,...,k;h = 1,...,m) is the loading of X j on F h. It is a measure of the correlation between the jth variable and the hth factor. The Γ matrix tells us which variables are mainly related to the dierent factors by detecting the strength and the sign of these links. (8) 20 / 62
21 Communalities of the squared factor loadings F 1 F h F m Communality m X 1 γ11 2 γ1h 2 γ1m 2 γ1h 2 h=1 X j γ 2 j1 γ 2 jh γ 2 jm X k γ 2 k1 γ 2 kh γ 2 km The sum by row gives the communality. With reference to the jth row, m h=1 m γjh 2 h=1 m γkh 2 h=1 γ 2 jh is the communality of X j, that is the share of variance of X j explained by all the m factors. 21 / 62
22 Theoretical VarianceCovariance Matrices In the light of the model assumptions Σ = Var(X) = ΓΓ + Ψ (9) Σ : (k k) varcov matrix of original variables; symmetric, unit variances on the main diagonal, covariances odiagonal Var(X j ) = 1 = m h=1 Cov(X j,x t ) = γ 2 jh + Var(ε 2 j ) (10) m h=1 γ jh γ th (11) Ψ : (k k) varcov matrix of unique components; diagonal, variances on the main diagonal, zero covariances 22 / 62
23 Observed vs. Theoretical variances On the one hand we have the observed variances and covariances of the X variables. The observed varcov matrix contains k (k 1) 2 distinct values (the elements above the diagonal) On the other, the variances and covariances implied by the factor model. The theoretical varcov matrix contains km parameters (only the factor loadings since the specic variances are functions of them) The model is useful for reducing the complexity if km < k (k 1) 2 that is if m < k / 62
24 Three stages 1 estimating the factor loadings γ jh (initial solution) as well as the communalities 2 trying to simplify the initial solution through a process known as factor rotation. After the rotation the nal factor solution is supposed to be more easily interpreted. Interpretation is useful to derive a meaningful label for each of the factors 3 estimating the factor scores so that these can be used in subsequent analyses in place of the original variables 24 / 62
25 Estimation  First stage If the model's assumptions are true, we should be able to estimate the loadings γ jh and the communalities so that the resulting estimates of the theoretical variances and covariances are close to the observed ones. Most common methods: ˆ Principal components method ˆ Maximum likelihood method 25 / 62
26 Principal components The principal component variables y 1,y 2,...,y k are dened to be linear combinations of the original variables X 1,X 2,...,X k that are uncorrelated and account for maximal proportions of the variation in the original data, i.e., y 1 accounts for the maximum amount of the variance among all possible linear combinations of X 1,X 2,...,X k (that is, it conveys the maximum informative contribution about the original variables) y 2 accounts for the maximum of the remaining variance subject to being uncorrelated with y 1 and so on. 26 / 62
27 Principal Components Method: Given X : (n k) matrix of k original variables and Σ : (k k) varcov matrix of original variables, the rst principal component to be extracted is a linear combination of X j of the following kind: y 1 = v 11 X 1 + v 12 X v 1k X k (12) or y 1 = Xv 1 (13) where y 1 is the (n 1) vector of the values of the rst principal component v 11 v 1 = v is the (k 1) vector of the coecients of the linear v 1k combination v 1 has to be estimated in such a way that Var(y 1 ) = max under the constraint v 1 v 1 = 1 27 / 62
28 First principal component The solution of the constrained maximization problem (that is the vector v 1 that maximizes the variance of the rst principal component subject to the constraint) is the rst eigenvector of Σ matrix. Moreover, Var(y 1 ) = λ 1, where λ 1 is the rst eigenvalue of Σ. It holds that Σv 1 = λ 1 v 1 (14) Since the total variability of the original variables (i.e. the sum of their variances) is equal to k (remember: they are standardized variables, each one has a variance equal to one), the ratio λ 1 k gives the share of total variability that is explained by the rst principal component 28 / 62
29 Second principal component The second principal component is y 2 = Xv 2 where v 2 is estimated in such a way that Var(y 2 ) = max under the constraints v 2 v 2 = 1 and Cov(y 1,y 2 ) = 0. v 2 is the second eigenvector of Σ matrix. Moreover, Var(y 2 ) = λ 2, where λ 2 is the second eigenvalue of Σ. The ratio λ 2 k gives the share of total variability that is explained by the second principal component 29 / 62
30 ith principal component The ith principal component is y i = Xv i where v i is estimated in such a way that Var(y i ) = max under the constraints v i v i = 1 and Cov(y i,y l ) = 0 (l = 1,2,...,i 1). v i is the ith eigenvector of Σ matrix whereas for the corresponding eigenvalue λ i it holds that Var(y i ) = λ i. The ratio λ i k gives the share of total variability that is explained by the ith principal component. The cumulative ratio λ 1+λ λ i k measures the share of total variability that is explained by the principal components up to ith 30 / 62
31 Extraction of all the principal components The method could in principle stop only when the number of extracted components equal the number of initial variables. Y = XV (15) where Y : (n k) matrix of principal components; Y = ( y 1 y 2... y k ) V : (k k) matrix of eigenvectors of Σ; V = ( v 1 v 2... v k ) 31 / 62
32 Covariance matrix of principal components λ λ 2 0 L = Cov(Y) = λ k (16) where λ 1 λ 2... λ k and k λ i = k i=1 y 1 shows the greatest information content, y 2 shows the second greatest information content,... Each principal component brings an information content which is not greater than the one brought by the previous principal component the k principal components explain 100% of the original variability However, in order for the method to produce actually a data reduction, the number of extracted components should be lesser than the original data dimension (m < k). 32 / 62
33 The choice of the number of components to be retained The number of principal components can be either directly specied or determined through a statistical/heuristic criterion. In the former case, the can be repeated with a dierent number of components and the solutions can be then compared according to goodnessoft statistics in order to choose the one that best describes the data. 33 / 62
34 The choice of the number of components to be retained In the latter case, examples of heuristic criteria are: 1 to extract and retain only those components whose associated eigenvalues exceed one (one is the mean value of the eigenvalues) 2 to retain those components that explain a given share  usually higher than 7075%  of the original variability (a 30% loss of variability can be usually accepted against a reduction in the data dimensions) 3 to use the scree plot (the plot of the eigenvalues y axis  against the order of extraction  x axis); the extraction should be stopped when the plot becomes at (the elbow rule) 34 / 62
35 Reading the FA output Eigenvalue λ i Dierence λ i λ i+1 Proportion λ i k i λ j j=1 Cumulative proportion k Based on the rule of eigenvalues greater than the average, three factors may be retained. The cumulative proportion of variance explained by three factors is 82.6%. 35 / 62
36 Scree plot 36 / 62
37 From principal components to factor loadings Once we have retained the rst m principal components, Y : (n m) matrix of p principal components; Y = ( ) y 1 y 2... y m V : (m m) matrix of eigenvectors of Σ; V = ( ) v 1 v 2... v m λ λ 2 0 L = Cov(Y) = λ m The matrix of (initial) factor loadings is Γ = ΣVL 1/2 37 / 62
38 Interpretation of factor solution s are articial constructs. Meaning is assigned to a factor through the subset of observed variables that have high loadings on that factor. The interpretation of the factors could be an easy task if every one of them was strongly correlated with a limited number of original variables and weakly correlated with the remaining variables (the higher the loadings of a few variables on one factor the more interpretable the factor). 38 / 62
39 Statistical relevance of a factor loading Rule of thumb: with a sample size of n = 200 units, a reasonable threshold for a factor loading to be relevant is It rises to 0.55 with n = 100 and to 0.75 with n = 50. Usually the initial factors show average correlations with many original variables. The initial factor solution can then be rotated with the purpose of creating new factors that are associated with few original variables and for this reason are more interpretable than the initial ones. 39 / 62
40 Aim of the rotation The factor rotation takes advantage of a property of factor model: there exists an innite number of set of values for the factor loadings yielding the same covariance matrix as that of the original model. Any new set of loadings is produced by a rotation of the initial solution. Let the initial factor solution represent a m dimension hyperplane: each original variable corresponds to a point whose coordinates are its loadings on the m factors. With the aim of getting more interpretable factors, the aim of the rotation is to nd new coordinate axes where every pointvariable is as close as possible to one of the new axes. 40 / 62
41 of the Orthogonal vs. oblique rotation ˆ Orthogonal rotation methods: the factors remain mutually uncorrelated ˆ Oblique rotation methods: the factors become correlated 41 / 62
42 Orthogonal rotation methods ˆ Varimax method ensures that only one or a few observed variables have large loadings on any given factor. The aim is to maximize the variability of the columns of the initial loading matrix. The rotated factor loadings will be very close either to one (in absolute value) or to zero, which facilitates the matching of the variables to a given factor ˆ Quartimax method ensures that each variable has large loadings only on one or a few factors. The objective is to maximize the variability of the rows of the initial loading matrix. Several variables may result strongly related to the same factor ˆ Equamax method (something in between the two previous methods) 42 / 62
43 From factor loadings to factor scores Let Γ 0 = ΣV 0 L 1/2 indicate the rotated loading matrix. The matrix of factor scores is then derived as F = XV 0 L 1/2. The principal components after the rotation are rescaled in order for them to have unit variance 43 / 62
44 Example 1  Supermarkets  Rotated Items F 1 (setting) F 2 (position) F 3 (price) Communality Convenience in going to score Product price Store location Sales promotion Width of aisle in the store Store athmosphere and decoration Store size % of variance Cumulative % of variance / 62
45 Example 1 Use of and Results The three factor scores resulting from the factor analysis are then used as independent variables for a logit regression analysis. Dependent variable: Store Preference (Binary Choice: e.g. Supermarkets in a Department store vs. Standalone Supermarkets) The results can be used to elaborate management strategies: when interested in expanding supermarket outlets in department stores, the factors which most inuence the probability of preferring the department stores should be the primary focus. 45 / 62
46 Example 2  Local Products  Eigenvalue λ i Dierence λ i λ i+1 Proportion λ i k i λ j j=1 Cumulative proportion k factors explaining 66.8% of the total variance were extracted that represent the key consumption dimensions 46 / 62
47 Example 2  Rotated Loadings 1: Topicality Original variables Loading Production methods Appearance of a special label Products with chemical adds Help to the local economy Price High value : Quality and Health Issues Original variables Loading Quality Health protection Environmental protection Nutrition value / 62
48 Example 2  Rotated Loadings 3: Appearance Original variables Loading Appearance Attractiveness of product's packing : Freshness and Taste Issues Original variables Loading Freshness of the product Taste of the product Interest about the product being clean : Curiosity and Prestige Original variables Loading Curiosity Prestige / 62
49 Example 2  Input for a segmentation analysis By replacing the original 17 variables with the 5 factors a segmentation analysis has been performed (through cluster analysis) with the aim of identifying homogeneous groups of consumers. Two groups result that have been named according to their behaviour patterns towards local products as ˆ Consumers inuenced by curiosity, prestige and freshness of the product as well as by marketing issues (attractiveness of the packing of the product, the appearance of the product generally) ˆ Consumers interested in the topicality of the product, in product's certication and environment protection. They pay attention to the ingredients of the product as well as to its price 49 / 62
50 Observed data  Beach resorts The following are based on: Bracalente B., Cossignani M., Mulas A. (2009), Statistica aziendale, McGraw Hill On a sample of beach resorts, the price of several beach facilities has been observed Variable name Description bed_d chair_d umb2beds_d bed_a bed_w umb+2beds_w paddle_h Bed per day Chair per day Umbrella and two beds per day Bed (only afternoon) Bed per week Umbrella and two beds per week Paddle boat per hour 50 / 62
51 FA output Eigenvalue λ i Dierence λ i λ i+1 Proportion λ i k i λ j j=1 Cumulative proportion k F F F F F F F The rst two eigenvalues are greater than one. The corresponding factors explain 77.4% of the original variability. Two factors are extracted. 51 / 62
52 Scree Plot 52 / 62
53 Loading matrix  Initial solution Variable F 1 F 2 Communality bed_d = chair_d = umb2beds_d ( ) 2 = bed_a = bed_w = umb+2beds_w ( ) 2 = paddle_h = For all the observed variables, the proportion of variance accounted for by the common factors (the communality) is very high, from 59.3% to 93.1%. The rst factor is positively related to the prices of beds, umbrellas and chair. The second factor accounts for the price of paddle boat 53 / 62
54 Loading Plot  Initial solution 54 / 62
55 Loading matrix after rotation Variable F 1 F 2 bed_d chair_d umb2beds_d bed_a bed_w umb+2beds_w paddle_h After the rotation, the rst factor shows strong (positive) correlations with the rst six original variables. The second factor is strongly associated with the last variable 55 / 62
56 Loading Plot after rotation 56 / 62
57 Retailer Customers A retailer asks a sample of customers about their monthly income and consumption expenditure (in thousands of euro) and their opinion (score from 0 to 10) on three sections of the store (meat, sh and frozen food). Can the ve original variables be summarized by a smaller number of factors? How many factors are needed and what percentage of the original variability they explain? How can the resulting factors be interpreted? Eigenvalue λ i Dierence λ i λ i+1 Proportion λ i k i λ j j=1 Cumulative proportion k F F F F F / 62
58 Loading matrix  Initial solution Variable F 1 F 2 Communality Unexplained income consumption q_meat q_sh q_froz / 62
59 Loading plot  Initial solution Loading plot 59 / 62
60 Loading matrix after rotation Variable F 1 F 2 Communality Unexplained income consumption q_meat q_sh q_froz / 62
61 Loading plot after rotation Loading plot after rotation 61 / 62
62 Bartholomew D.J. (1987), Latent Variable s and, Charles Grin & Company Ltd., London. Bracalente B., Cossignani M., Mulas A. (2009), Statistica aziendale, McGraw Hill Tryfos P. (1998), Methods for Business and Forecasting: Text and Cases, John Wiley & Sons. 62 / 62
Common factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationExploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationFactor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business
Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationFACTOR ANALYSIS NASC
FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively
More informationFactor Analysis. Sample StatFolio: factor analysis.sgp
STATGRAPHICS Rev. 1/10/005 Factor Analysis Summary The Factor Analysis procedure is designed to extract m common factors from a set of p quantitative variables X. In many situations, a small number of
More informationExploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
More informationStatistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth GarrettMayer
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationDoing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. OlliPekka Kauppila Rilana Riikkinen
Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales OlliPekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationFACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.
FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationRachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a RealWorld Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a realworld example of a factor
More information2. Linearity (in relationships among the variablesfactors are linear constructions of the set of variables) F 2 X 4 U 4
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variablesfactors are linear constructions of the set of variables) 3. Univariate and multivariate
More information4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variablesfactors are linear constructions of the set of variables; the critical source
More informationTopic 10: Factor Analysis
Topic 10: Factor Analysis Introduction Factor analysis is a statistical method used to describe variability among observed variables in terms of a potentially lower number of unobserved variables called
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationFactor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)
Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red) The following DATA procedure is to read input data. This will create a SAS dataset named CORRMATR
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationExploratory Factor Analysis Brian Habing  University of South Carolina  October 15, 2003
Exploratory Factor Analysis Brian Habing  University of South Carolina  October 15, 2003 FA is not worth the time necessary to understand it and carry it out. Hills, 1977 Factor analysis should not
More informationFactor Rotations in Factor Analyses.
Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are
More informationMultivariate Analysis (Slides 4)
Multivariate Analysis (Slides 4) In today s class we examine examples of principal components analysis. We shall consider a difficulty in applying the analysis and consider a method for resolving this.
More information5.1 CHISQUARE TEST OF INDEPENDENCE
C H A P T E R 5 Inferential Statistics and Predictive Analytics Inferential statistics draws valid inferences about a population based on an analysis of a representative sample of that population. The
More informationMultiple regression  Matrices
Multiple regression  Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationFACTOR ANALYSIS EXPLORATORY APPROACHES. Kristofer Årestedt
FACTOR ANALYSIS EXPLORATORY APPROACHES Kristofer Årestedt 20130428 UNIDIMENSIONALITY Unidimensionality imply that a set of items forming an instrument measure one thing in common Unidimensionality is
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationDISCRIMINANT FUNCTION ANALYSIS (DA)
DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant
More informationOverview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone: (205) 3484431 Fax: (205) 3488648 August 1,
More informationHow to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: LorenzoSeva, U. (2013). How to report
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationTtest & factor analysis
Parametric tests Ttest & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationIntroduction to Principal Component Analysis: Stock Market Values
Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from
More informationData analysis process
Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis
More informationPRINCIPAL COMPONENT ANALYSIS
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
More informationSPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011
SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis
More informationExploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models
Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationRevenue Management with Correlated Demand Forecasting
Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More information5.2 Customers Types for Grocery Shopping Scenario
 CHAPTER 5: RESULTS AND ANALYSIS 
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationA Brief Introduction to Factor Analysis
1. Introduction A Brief Introduction to Factor Analysis Factor analysis attempts to represent a set of observed variables X 1, X 2. X n in terms of a number of 'common' factors plus a factor which is unique
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationPRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE
PRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE Markela Muca Llukan Puka Klodiana Bani Department of Mathematics, Faculty of
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationproblem arises when only a nonrandom sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a nonrandom
More informationRandom Vectors and the Variance Covariance Matrix
Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationA Brief Introduction to SPSS Factor Analysis
A Brief Introduction to SPSS Factor Analysis SPSS has a procedure that conducts exploratory factor analysis. Before launching into a step by step example of how to use this procedure, it is recommended
More information1 Example of Time Series Analysis by SSA 1
1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales
More informationLeastSquares Intersection of Lines
LeastSquares Intersection of Lines Johannes Traa  UIUC 2013 This writeup derives the leastsquares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationMULTIPLEOBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process
MULTIPLEOBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process Business Intelligence and Decision Making Professor Jason Chen The analytical hierarchy process (AHP) is a systematic procedure
More informationPsychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use
: Suggestions on Use Background: Factor analysis requires several arbitrary decisions. The choices you make are the options that you must insert in the following SAS statements: PROC FACTOR METHOD=????
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationTo do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.
Factor Analysis in SPSS To conduct a Factor Analysis, start from the Analyze menu. This procedure is intended to reduce the complexity in a set of data, so we choose Data Reduction from the menu. And the
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationMultivariate Analysis
Table Of Contents Multivariate Analysis... 1 Overview... 1 Principal Components... 2 Factor Analysis... 5 Cluster Observations... 12 Cluster Variables... 17 Cluster KMeans... 20 Discriminant Analysis...
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationThe Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
Advances in Information Mining ISSN: 0975 3265 & EISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp2632 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
More informationChapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
More informationA Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.
A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. John L. Weatherwax May 7, 9 Introduction Here you ll find various notes and derivations
More informationExploratory Factor Analysis
Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More informationUsing the Singular Value Decomposition
Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces
More informationCHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS
Examples: Exploratory Factor Analysis CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Exploratory factor analysis (EFA) is used to determine the number of continuous latent variables that are needed to
More informationA Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis
Tutorials in Quantitative Methods for Psychology 2013, Vol. 9(2), p. 7994. A Beginner s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis An Gie Yong and Sean Pearce University of Ottawa
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationStep 5: Conduct Analysis. The CCA Algorithm
Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Logtransformed species abundances P Rownormalized species log abundances (chord distance) P Selected
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 306022501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 306022501 May 2008 Introduction Suppose we had measured two variables, length and width, and
More informationMultivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
More informationFactorial Invariance in Student Ratings of Instruction
Factorial Invariance in Student Ratings of Instruction Isaac I. Bejar Educational Testing Service Kenneth O. Doyle University of Minnesota The factorial invariance of student ratings of instruction across
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationExtending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?
1 Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances? André Beauducel 1 & Norbert Hilger University of Bonn,
More informationFitting Subjectspecific Curves to Grouped Longitudinal Data
Fitting Subjectspecific Curves to Grouped Longitudinal Data Djeundje, Viani HeriotWatt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK Email: vad5@hw.ac.uk Currie,
More informationExploratory Factor Analysis
Exploratory Factor Analysis ( 探 索 的 因 子 分 析 ) Yasuyo Sawaki Waseda University JLTA2011 Workshop Momoyama Gakuin University October 28, 2011 1 Today s schedule Part 1: EFA basics Introduction to factor
More information