Special Chapters on Artificial Intelligence
|
|
- Norman Roger Alexander
- 7 years ago
- Views:
Transcription
1 Special Chapters on Artificial Intelligence Lecture 8. Principal component analysis Cristian Gatu 1 Faculty of Computer Science Alexandru Ioan Cuza University of Iaşi, Romania MCO, MDS,
2 Principal Component Analysis (PCA) Technique for transforming original variables into new ones which are uncorrelated and account for decreasing proportions of variance in the data. New variables are linear combinations of old ones. Transformation is a rotation/reflection of original points no essential statistical information is lost (or created ). Can assess the importance of individual new components and assess how many are needed ( scree plots etc). Can assess the importance of original variables (examination of loadings).
3 Objective of PCA The objective of the PCA is to transform a set of interrelated variables into a set of unrelated linear combinations of variables. The set of linear combinations is chosen so that each of the linear combinations accounts for a decreasing proportion of the variance in the original variables.
4 PCA Given the n variables x 1,...,x n and X = (x 1,...,x n ), the objective is to find a linear transformation of X Y = (y 1,...,y n ) such that: The 1st component y1 is the most interesting. The 2nd component y2 is the 2nd most interesting. The 3rd component y3 is the 3rd most interesting, etc. That is, we want to chose a new co-ordinate system so that the data, when refer to this new system, Y, are such that, the 1st component contains most information, the 2nd component contains the next most information, etc.
5 PCA The hope is that the first few (2, 3 or 4 say) components contain nearly all the information in the data and the remaining 2,3,4 components contain relatively little information and can be discarded. I.e. the statistical analysis can be concentrated on just the first few components (much easier). A linear transformation X Y is given by Y = XQ, where Q is an n n non-singular matrix. If Q happens to be an orthogonal matrix, i.e. Q T Q = I n, then the transformation X Y is an orthogonal transformation.
6 PCA The basic idea is to find a set of orthogonal coordinates such that the sample variances of the data with respect to these coordinates are in decreasing order of magnitude, i.e. the projection of the points onto the 1st principal component has maximal variance among all such linear projections, the projection onto the 2nd has maximal variance subject to orthogonality with the first, projection onto the 3rd has maximal variance subject to orthogonality with the first two, etc. most interesting most information maximum variance.
7 PCA: eigenanalysis This objective can be achieved by an eigenanalysis of the variance matrix S of the data, i.e. the matrix of all two-way covariances between the variables and variances along the diagonal. It can be shown that if we transform the original data to principal components then 1. the sample variances of the data on successive components are equal to the eigenvalues of the variance matrix S of the data; 2. the total variation is exactly the same on the complete set of principal components as on the original variables, so no information is lost. It is just rearranged into order.
8 Spectral decomposition Let S denote the n n variance-covariance matrix of the n variables x 1,...,x n. E.g. let X = ( ) x 1 x 2 x 3 and Var(X) = S = Since S is symmetric and positive definite matrix, it has real eigenvalues λ 1 λ 2 λ n > 0.
9 Spectral decomposition The spectral decomposition of S is given by: λ 1 Q T λ 2 SQ = Λ =.... E.g = λ n
10 Spectral decomposition Let y i to denote a transformed variable, where Y = (y 1,...,y n ) and Y = XQ. That is, ( ) ( ) y1 y 2 y 3 = x1 x 2 x or y 1 = 0.38x x 2 y 2 = x 3 y 3 = 0.92x x 2
11 Spectral decomposition The variance-covariance matrix of y is given by: Var(Y) = Var(XQ) = Q T Var(X)Q (where Var(X) = S) = Q T SQ λ 1 λ 2 = Λ =.... λ n
12 Verification Note that Var(aX +by) = a 2 Var(X)+b 2 Var(Y)+2abCov(X,Y). Var(y 1 ) = Var( 0.38x x 2 ) = ( 0.38) 2 Var(x 1 )+(0.92) 2 Var(x 2 )+2( 0.38)(0.92)Cov(x 1,x 2 ) = 0.15(1)+0.85(5) 0.7( 2) = 5.8 = λ 1, Var(y 2 ) = Var(x 3 ) = 2 = λ 2, Var(y 3 ) = Var(0.92x x 2 ) = Var(x 1 ) Var(x 2 ) Cov(x 1,x 2 ) = ( 2) = 0.17 = λ 3.
13 Verification Cov(y 1,y 2 ) =Cov( 0.38x x 2, x 3 ) = 0.38Cov(x 1,x 3 )+0.92Cov(x 2,x 3 ) = 0.38(0)+0.92(0) = 0. Cov(y 2,y 3 ) =Cov(x 3,0.92x x 2 ) =0.92Cov(x 3,x 1 )+0.38Cov(x 3,x 2 ) = = 0. Cov(y 1,y 3 ) =Cov( 0.38x x 2, 0.92x x 2 ) = Cov(x 1,x 1 ) Cov(x 1,x 2 ) Cov(x 2,x 1 ) Cov(x 2,x 2 ) = = 0
14 PCA y 1,...,y n are the principal components of x 1,...,x n. y i has variance λ i. y i is uncorrelated with y j (i j), since Λ, i.e. the covariance of Y, is diagonal. From λ 1 λ 2 λ n it follows that y 1 has the largest variance λ 1, y 2 has the second largest variance λ 2, and so on. λ 1 = 5.83, λ 2 = 2 λ 3 = 0.17.
15 PCA The total variation in the original data is the sum of the variances of the original variables x 1,...,x n. That is, Notice that: trace(s) = trace(q T ΛQ) S 11 +S S nn = trace(s). = trace(λqq T ) (since trace(ab) = trace(ba)) = trace(λ) (since QQ T = I n.) = λ 1 +λ 2 + +λ n. which is the sum of variances of the n PCs y 1,...,y n. S 11 +S S nn = 8 = λ 1 +λ 2 + +λ n.
16 PCA Thus, the total variation is exactly the same on the complete set of principal components as on the original variables. So no information is lost it is just rearranged into order. The sum of the variances n i=1 λ i of the n PCs y 1,...,y n is the same as the sum of the variances n i=1 S ii of the original variables x 1,...,x n. The components with smaller variances could be ignored without significantly effective the total variance, thereby reducing the number of variables from n. The total variation of the PCs is n i=1 λ i = n i=1 S ii = trace(s). So we can interpret λ 1 / n i=1 λ i as the proportion of the total variation explained by the first principal component. In the example λ 1 / λ i = 5.83/8 = 0.73.
17 Proportion of variation The proportion of the total variation explained by the first two PCs is given by (λ 1 +λ 2 )/ n i=1 λ i. (λ 1 +λ 2 )/ λ i = (5.83+2)/8 = In general the proportion of the total variation explained by the first k PCs is given by k i=1 λ i. n λ i i=1
18 Proportion of variation If the first few PCs explain most of the variation in the data, then the later PCs are redundant and little information is lost if they are discarded (or ignored). E.g. if k i=1 λ i n i=1 λ i = say 80+%, then the (k +1)th,...,nth components contain relatively little information and the dimensionality of the data can be reduced from n to k with little loss of information. Useful if k = 1,2,3,4?,5??? The figure of 80% is quite arbitrary and depends really on the type of data being analyzed particularly on the applications area. Some areas might conventionally be satisfied if 40% of the variation can be explained in a few PCs, others might require 90%.
19 Scree-plot A figure (percentage) needs to be chosen as a trade-off between the convenience of a small value of k and a large value of the cumulative relative proportion of variance explained. If n is large an informal way of choosing a good k is graphically with a scree-plot. That is, plot vs k. k i=1 λ i n i=1 λ i The graph (scree-plot) will be necessarily monotonic and convex. Typically it will increase steeply for the first few values of k (i.e. the first few PCs) and then begin to level off. The point where it starts leveling off is the point where bringing in more PCs brings less returns in terms of variance explained.
20 Scree-plot k i=1 n i=1 λ i λ i kink or elbow graph suddenly flattens, take k = n k
21 Analytical Approach Given n variables the objective of the principal components is to form n linear combinations: y 1 = w 11 x 1 +w 12 x 2 + +w 1n x n y 2 = w 21 x 1 +w 22 x 2 + +w 2n x n. y n = w n1 x 1 +w n2 x 2 + +w nn x n.
22 Analytical Approach Here y 1,y 2,...y n are the n principal components. w ij is the weight of the jth variable for the ith principal component. Var(y 1 ) > Var(y 2 ) > > Var(y n ). n k=1 w2 ik = w2 i1 + +w2 in = 1. n k=1 w ikw jk = w i1 w j1 + +w in w jn = 0. Example: y 1 = 0.728x x 2 and y 2 = 0.685x x = 1 and ( 0.685) = 0.
23 Scores and Loadings The principal component scores are the values (output) of the principal component variables. The loading is the simple correlation between the original and the new (principal component) variables. This is an indication of the extent to which the original variables are influential or important in forming the principal components. That is, the higher the loading the more influential the variable is in forming the principal component scores and vice-versa.
24 Scores and Loadings The loadings can be obtained from the relationship: where l ij = w i,j ˆσ j λi, 1. l ij is the loading of the jth variable to the ith principal component. 2. w i,j is the weight of jth variable to the ith principal component. 3. ˆσ j is the standard deviation of the jth variable. 4. λ i is the eigenvalue (i.e. variance) of the ith principal component.
25 Example: Financial ratios X 1 and X 2 The table presents a small data set consisting of 12 observations and 2 variables X 1 and X 2 (financial ratios). The table also gives the mean corrected data (denoted by X1 and X2), the SSCP, the covariance matrix S and the correlation matrix R. The mean corrected variables are transformed (rotated) using the orthogonal matrix (rotation). The rotation that took place is: ( P1 P 2 ) = ( X 1 X 2) ( cos(θ) sin(θ) sin(θ) cos(θ) ). E.g. if θ = , then cos(θ) = and sin(θ) = For example, ( ) ( ) ( ) =
26 Example: Financial ratios X 1 and X 2 Original Mean Rotated Corrected by o X 1 X 2 X1 X2 P 1 P Mean: Var:
27 Example: Financial ratios X 1 and X 2 Original Variables: ( ) SSCP = ( ) S = ( ) R = New Variables: ( ) SSCP = ( ) S = ( ) R =
28 Example: Financial ratios X 1 and X 2 The variance accounted for by the new variables P 1 and P 2 for various angles (rotations). Rotate by Total Variance Percent Angle θ Variance for P 1 %
29 Example: Financial ratios X 1 and X 2 Percent of total variance accounted for by P_1 percentage of variance Angle theta for rotation
30 Computer Output: Financial ratios X 1 and X 2 Simple Statistics X 1 X 2 Mean St. Dev Importance of Components: PC1 PC2 Standard deviation Proportion of Variance Cumulative Proportion Covariances X 1 X 2 X X Correlation Coefficients: X 1 X 2 PC1 PC2 X X PC e-17 PC e Total Variance = Eigenvectors: PC1 PC2 X X
31 Computer Output: Financial ratios X 1 and X 2 Scores: X 1 X 2 PC1 PC Simple Statistics: Mean St.Dev X X PC1 7.4e PC2-1.5e
32 Summary The total variances of X 1 and X 2 is (i.e ). The variables X 1 and X 2 have correlation coefficient The percentage of the total variance accounted for by X 1 and X 2 are given respectively, by 52.26% and 47.74%. Each of the new variables (i.e. principal components P 1 and P 2 ) are linear combinations of the original variables and remain mean corrected. That is their means are zero. The total SS (Sum of Squares) for P 1 and P 2 is the same as the total SS for the original variables ( = 486). The variances of P 1 and P 2 are, respectively, and The total variance of the principal components is and is the same as the total of the variance of the original variables X 1 and X 2.
33 Summary The percentage of the total variance accounted for by P 1 and P 2 are, respectively, 87.31% (= 38.58/44.18) and 12.69% (= 5.61/44.18). The variance accounted for by the first principal component P 1 is greater than the variance accounted for by any one of the original variables. The second principal component P 2 accounts for variance that has not been accounted for by P 1. The two principal components accounts for all the variance in the data. The correlation between the principal components is zero, that is, P 1 and P 2 are uncorrelated.
34 Effect of type of data on PCA Consider the data below which shows the Estimated Retail Prices by Cities, March 1973, U.S. Department of Labour, Bureau of Labor Statistics, pp1-8. Average Price (in cents per pound) City Bread Burger Milk Oranges Tomatoes Atlanta Baltimore Boston Buffalo Chicago Cincinnati Cleveland Dallas Detroit Honolulu Houston Kansas City Los Angeles Milwaukee Minneapolis New York Philadelphia Pittsburgh St. Louis San Diego San Francisco Seattle Washington DC Mean Variance % of total Variance: Total Variance:
35 Effect of type of data on PCA The objective is to form a measure of the Consumer Price Index (CPI). That is, we would like to form a weighted sum of the various food prices that would summarize how expensive or cheap are a given city s food items. PCA would be an appropriate technique for developing such an index. Principal components analysis can be either done on mean-corrected or standardized data. Each data set could give a different solution depending upon the extend to which the variances of the variables differ. That is, the variances of the variables could have an effect on PCA.
36 Computer output Simple Statistics Bread Burger Milk Oranges Tomatoes Mean St. Dev Covariance Matrix Bread Burger Milk Oranges Tomatoes Bread Burger Milk Oranges Tomatoes Importance of Components: PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of Variance Cumulative Proportion
37 Computer output Eigenvectors: PC1 PC2 PC3 PC4 PC5 Bread Burger Milk Oranges Tomatoes Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC PC
38 Computer output Scores: PC1 PC2 PC3 PC4 PC5 Baltimore Los Angeles Atlanta Washington DC Seattle Dallas New York Pittsburgh Buffalo Honolulu
39 Summary The first principal component PC1 is given by: PC1 =0.028Bread+0.2Burger+0.042Milk Oranges Tomatoes. The variance of PC1 is and accounts for 58.8% of the total variance of the original data. The PC1 is the sum of all food prices and is very much affected by the price of oranges. Since all the weights of PC1 are positive a high score will imply that the food prices are high and vice-versa. Thus, from the score (values) of PC1 suggests that Honolulu is the most expensive city and Baltimore is the least expensive city.
40 Summary The main reason the price of oranges dominates the formation of PC1 is that there exist a wide variation in the price of oranges across the cities. That is, the variance of the price for oranges is very high compared to the variances of the prices of other food items. In general, the weight assigned to a variable is affected by the relative variance of the variable. If we do not want the relative variance to affect the weights, then the data should be standardized so that the variance of each variable is the same (i.e. one).
41 PCA after standardizing the data Correlation Matrix Bread Burger Milk Oranges Tomatoes Bread Burger Milk Oranges Tomatoes Importance of Components: PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of Variance Cumulative Proportion
42 PCA after standardizing the data Eigenvectors: PC1 PC2 PC3 PC4 PC5 Bread Burger Milk Oranges Tomatoes Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC
43 PCA after standardizing the data Scores: PC1 PC2 PC3 PC4 PC5 Seattle San Diego Houston Cleveland Los Angeles Pittsburgh Philadelphia Boston New York Honolulu
44 Summary Since the data are standardized, the variance of each variable is one and each variable accounts for 20% of the total variance. The first principal component accounts for 48.84% (= /5) of the total variance and is given by: PC1 =0.496Bread+0.576Burger+0.340Milk Oranges Tomatoes. The second principal component accounts for 22.1% (= /5) of the total variance and is given by: PC2 = 0.309Bread 0.044Burger 0.431Milk Oranges Tomatoes.
45 Summary The PC1 is a weighted sum of all the food prices and no one food item dominates the formation of the score. The value of PC1 suggests that Honolulu is the most expensive city and Seattle is now the least expensive city, as compared to Baltimore when the data were not standardized. Therefore, the weights that are used to form the CPI are affected by the relative variances of the variables. The decision of how many principal components should retain is dependent on how much information (i.e. uncounted variance) one is willing to sacrifice, which is a judgment question.
46 Summary Two alternatives are: Use the scree plot and look for an elbow. The rule can be used for both mean-corrected and standardized data. In the case of standardized data retain only those components whose eigenvalues (variance) are greater than one. this is referred to as the eigenvalue-greater-than-one rule. This rule is the default option in most of the statistical packages (SAA and SPSS). The rationale for this rule is that for standardize data the amount of variance extracted by each component should at minimum be equal to the variance of at least one variable.
47 Variance plot PCs Variances Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
48 Scree Plot
49 Cumulative Scree Plot
50 Interpreting Principal Components Since the PCs are linear combinations of the original variables, it is often necessary to interpret or provide a meaning to the linear combinations. As mentioned earlier, one can use loadings for interpreting the PCs. Consider the loadings for the first 2 PCs when standardize data are used: Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC
51 Interpreting Principal Components The higher the loading of a variable, the more influence it has in the formation of the principal component score and vice-versa. Therefore, one can use the loading to determine which variables are influential in the formation of PCs, and one can then assign a meaning or label to the PC. How high should the loading be before we can say that a given variable is influential in the formation of a PC score?. Unfortunately there is no guidelines to help us in establishing how high is high. Traditionally, researchers have used a loading of 0.5 or above as the cutoff point.
52 Interpreting Principal Components If we use 0.5 as the cutoff value, then the loadings for the first 2 PCs are: Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC It can be see that the first PC represents the price index for non-fruit items, while the second PC represents the price of the fruit item (i.e. oranges). That is, the PC1 is a measure of the prices of bread, hamburger, milk and tomatoes across the cities, while the PC2 is the measure of the prices of oranges across the cities. We can label PC1 as the CPI of non-fruit items and PC2 as the CPI of fruit items.
53 Biplot PC Oranges Milwaukee Cleveland Detroit Buffalo Minneapolis Dallas Pittsburgh St._Louis Houston Chicago Philadelphia Boston San_Francisco Seattle Cincinnati Kansas_City San_Diego Los_Angeles Washington_DC Milk Baltimore Atlanta Tomatoes Honolul Burger New_York Bread PC1
2015 NFL Annual Selection Meeting R P O CLUB PLAYER POS COLLEGE ROUND 2
ROUND 2 2 1 33 TENNESSEE 2 2 34 TAMPA BAY 2 3 35 OAKLAND 2 4 36 JACKSONVILLE 2 5 37 NEW YORK JETS 2 6 38 WASHINGTON 2 7 39 CHICAGO 2 8 40 NEW YORK GIANTS 2 9 41 ST. LOUIS 2 10 42 ATLANTA 2 11 43 CLEVELAND
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationPrincipal Component Analysis
Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.
More informationFACTOR ANALYSIS NASC
FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationExploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
More informationRachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor
More informationTrade Show Labor Rate Benchmarking Survey
2011 Trade Show Labor Rate Benchmarking Survey EXHIBITOR COSTS IN 41 U.S. CITIES labor drayage audio visual exhibitor services eventmarketing.com eventmarketing.com Produced by in association with eventmarketing.com
More informationThe ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.
More Principal Components Summary Principal Components (PCs) are associated with the eigenvectors of either the covariance or correlation matrix of the data. The ith principal component (PC) is the line
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationASSESSING RISK OF SENIOR LIVING OVER-SUPPLY A LONG-TERM PERSPECTIVE
TOPICS ASSESSING RISK OF SENIOR LIVING OVER-SUPPLY A LONG-TERM PERSPECTIVE The ratio of new openings to existing inventory ratio (the new openings ratio ) in combination with the ratio of units currently
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
More informationIntroduction to Principal Component Analysis: Stock Market Values
Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from
More informationStatistics for Business Decision Making
Statistics for Business Decision Making Faculty of Economics University of Siena 1 / 62 You should be able to: ˆ Summarize and uncover any patterns in a set of multivariate data using the (FM) ˆ Apply
More informationFACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.
FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationFactor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business
Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationTo do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.
Factor Analysis in SPSS To conduct a Factor Analysis, start from the Analyze menu. This procedure is intended to reduce the complexity in a set of data, so we choose Data Reduction from the menu. And the
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationFactor Rotations in Factor Analyses.
Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are
More informationThe Strategic Assessment of the St. Louis Region
The Strategic Assessment of the St. Louis Region 7th Edition, 2015 WHERE The 7th Edition of Where We Stand (WWS) presents 222 rankings comparing St. Louis to the 50 most populated metropolitan areas in
More informationSolution Let us regress percentage of games versus total payroll.
Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)
More information4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables--factors are linear constructions of the set of variables; the critical source
More informationCommon factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More information2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) 3. Univariate and multivariate
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationSACKS-CHAMPIONSHIPS-GAMES WON ALL-TIME RECORDS TEAM RECORDS
SACKS-CHAMPIONSHIPS-GAMES WON TEAM RECORDS CHAMPIONSHIPS Most Seasons League Champion 13 Green Bay, 1929-1931, 1936, 1939, 1944, 1961-62, 1965-67, 1996, 2010 9 Chi. Bears, 1921, 1932-33, 1940-41, 1943,
More informationAdjusting Compensation for Geographical Differences
Adjusting Compensation for Geographical Differences Dan A. Black Harris School University of Chicago Three parts of the talk First, I will give you some background about prices across geography and how
More informationStatistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationExhibition & Event Industry Labor Rates Survey
Exhibition & Event Industry Labor Rates Survey Event Labor and Material Handling Cost Averages in Over 40 North American Cities Special Report Produced by RESEARCH & CONSULTING Font: Ocean Sans MM 648
More informationFactor Analysis. Sample StatFolio: factor analysis.sgp
STATGRAPHICS Rev. 1/10/005 Factor Analysis Summary The Factor Analysis procedure is designed to extract m common factors from a set of p quantitative variables X. In many situations, a small number of
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationExploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationTHE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE
THE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE Mary Violeta Petrescu Ph. D University of Craiova Faculty of Economics and Business Administration Craiova, Romania Abstract: : After
More informationCanonical Correlation Analysis
Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #11-8/4/2011 Slide 1 of 39 Today s Lecture Canonical Correlation Analysis
More informationSynopsis: In the first September TripCase product release there will be several big updates.
TripCase Document Delivery 15.09 Implementation: 10 nd Sep2015 SIN time Synopsis: In the first September TripCase product release there will be several big updates. 1) Frontline agents will have access
More informationMultiple regression - Matrices
Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationPRINCIPAL COMPONENT ANALYSIS
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
More informationFactor Analysis Using SPSS
Psychology 305 p. 1 Factor Analysis Using SPSS Overview For this computer assignment, you will conduct a series of principal factor analyses to examine the factor structure of a new instrument developed
More informationZurich Staff Legal. Experienced. Collaborative. Focused on Results.
Zurich Staff Legal Experienced. Collaborative. Focused on Results. Staff Legal How We Deliver We are located where you need us, with more than 250 attorneys in our national network of 48 offices that cover
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationForm LM-3 Common Reporting Errors
OLMS COMPLIANCE TIP Form LM-3 Common Reporting Errors The Office of Labor-Management Standards (OLMS) enforces certain provisions of the Labor- Management Reporting and Disclosure Act (LMRDA), including
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationRISK ANALYSIS ON THE LEASING MARKET
The Academy of Economic Studies Master DAFI RISK ANALYSIS ON THE LEASING MARKET Coordinator: Prof.Dr. Radu Radut Student: Carla Biclesanu Bucharest, 2008 Table of contents Introduction Chapter I Financial
More informationTAMPA INTERNATIONAL AIRPORT
Albany Southwest Airlines 73W 332 0900 1150 1,130 143...7 Albany Southwest Airlines 73W 1811 0900 1150 1,130 143 12345.. Albany Southwest Airlines 73W 6066 1110 1400 1,130 143...6. Atlanta Delta Air Lines
More informationThe Most Affordable Cities For Individuals to Buy Health Insurance
The Most Affordable Cities For Individuals to Buy Health Insurance Focusing on Health Insurance Solutions for Millions of Americans Copyright 2005, ehealthinsurance. All rights reserved. Introduction:
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationBehavioral Entropy of a Cellular Phone User
Behavioral Entropy of a Cellular Phone User Santi Phithakkitnukoon 1, Husain Husna, and Ram Dantu 3 1 santi@unt.edu, Department of Comp. Sci. & Eng., University of North Texas hjh36@unt.edu, Department
More informationHouston Economic Outlook. Presented by Patrick Jankowski Vice President, Research
Houston Economic Outlook Presented by Patrick Jankowski Vice President, Research www.houston.org Follow me on Twitter @pnjankowski Read my blog: wwwhouston.org/economy/blog Connect with me: www.linkedincom/in/pnjankowski
More informationState of Stress at Point
State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More information4. Matrix Methods for Analysis of Structure in Data Sets:
ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationTHE FEDERAL BUREAU OF INVESTIGATION FINANCIAL INSTITUTION FRAUD AND FAILURE REPORT
U.S. Department of Justice Federal Bureau of Investigation FINANCIAL INSTITUTION FRAUD UNIT FINANCIAL CRIMES SECTION FINANCIAL INSTITUTION FRAUD AND FAILURE REPORT FISCAL YEAR 2002 THE FEDERAL BUREAU OF
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationTrends. Trends in Office Buildings Operations, 2011
Trends Trends in Office Buildings Operations, 2011 THE SAMPLE This 2012 edition represents 2011 data collection from nearly 2,700 private-sector buildings across the United States and Canada. This year
More informationExploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk
Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationExploratory Factor Analysis
Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than
More informationExploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models
Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of
More informationOverview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
More informationPROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
More informationRisk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014
Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationAtlanta Rankings 2014
Atlanta Rankings Major National Magazine and Study Rankings BUSINESS FACILITIES Metro Business Rankings Lowest Cost of Doing Business 2. Orlando, FL 3. Charlotte, NC 4. San Antonio, TX 5. Tampa, FL 6.
More informationZillow Negative Equity Report
Overview The housing market is finally showing signs of life, with many metropolitan areas having hit the elusive bottom and seeing home value appreciation, however negative equity remains a drag on the
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More information10. Comparing Means Using Repeated Measures ANOVA
10. Comparing Means Using Repeated Measures ANOVA Objectives Calculate repeated measures ANOVAs Calculate effect size Conduct multiple comparisons Graphically illustrate mean differences Repeated measures
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationprovides it s customers with a single, reliable source for their wiring construction needs.
provides it s customers with a single, reliable source for their wiring construction needs. Focused on Telephone, Cable Television, Data, Power and Security Wiring, C3 provides the construction management
More informationCovariance and Correlation
Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationA Brief Introduction to Factor Analysis
1. Introduction A Brief Introduction to Factor Analysis Factor analysis attempts to represent a set of observed variables X 1, X 2. X n in terms of a number of 'common' factors plus a factor which is unique
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationSPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011
SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis
More informationPsychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use
: Suggestions on Use Background: Factor analysis requires several arbitrary decisions. The choices you make are the options that you must insert in the following SAS statements: PROC FACTOR METHOD=????
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More information