Special Chapters on Artificial Intelligence

Size: px
Start display at page:

Download "Special Chapters on Artificial Intelligence"

Transcription

1 Special Chapters on Artificial Intelligence Lecture 8. Principal component analysis Cristian Gatu 1 Faculty of Computer Science Alexandru Ioan Cuza University of Iaşi, Romania MCO, MDS,

2 Principal Component Analysis (PCA) Technique for transforming original variables into new ones which are uncorrelated and account for decreasing proportions of variance in the data. New variables are linear combinations of old ones. Transformation is a rotation/reflection of original points no essential statistical information is lost (or created ). Can assess the importance of individual new components and assess how many are needed ( scree plots etc). Can assess the importance of original variables (examination of loadings).

3 Objective of PCA The objective of the PCA is to transform a set of interrelated variables into a set of unrelated linear combinations of variables. The set of linear combinations is chosen so that each of the linear combinations accounts for a decreasing proportion of the variance in the original variables.

4 PCA Given the n variables x 1,...,x n and X = (x 1,...,x n ), the objective is to find a linear transformation of X Y = (y 1,...,y n ) such that: The 1st component y1 is the most interesting. The 2nd component y2 is the 2nd most interesting. The 3rd component y3 is the 3rd most interesting, etc. That is, we want to chose a new co-ordinate system so that the data, when refer to this new system, Y, are such that, the 1st component contains most information, the 2nd component contains the next most information, etc.

5 PCA The hope is that the first few (2, 3 or 4 say) components contain nearly all the information in the data and the remaining 2,3,4 components contain relatively little information and can be discarded. I.e. the statistical analysis can be concentrated on just the first few components (much easier). A linear transformation X Y is given by Y = XQ, where Q is an n n non-singular matrix. If Q happens to be an orthogonal matrix, i.e. Q T Q = I n, then the transformation X Y is an orthogonal transformation.

6 PCA The basic idea is to find a set of orthogonal coordinates such that the sample variances of the data with respect to these coordinates are in decreasing order of magnitude, i.e. the projection of the points onto the 1st principal component has maximal variance among all such linear projections, the projection onto the 2nd has maximal variance subject to orthogonality with the first, projection onto the 3rd has maximal variance subject to orthogonality with the first two, etc. most interesting most information maximum variance.

7 PCA: eigenanalysis This objective can be achieved by an eigenanalysis of the variance matrix S of the data, i.e. the matrix of all two-way covariances between the variables and variances along the diagonal. It can be shown that if we transform the original data to principal components then 1. the sample variances of the data on successive components are equal to the eigenvalues of the variance matrix S of the data; 2. the total variation is exactly the same on the complete set of principal components as on the original variables, so no information is lost. It is just rearranged into order.

8 Spectral decomposition Let S denote the n n variance-covariance matrix of the n variables x 1,...,x n. E.g. let X = ( ) x 1 x 2 x 3 and Var(X) = S = Since S is symmetric and positive definite matrix, it has real eigenvalues λ 1 λ 2 λ n > 0.

9 Spectral decomposition The spectral decomposition of S is given by: λ 1 Q T λ 2 SQ = Λ =.... E.g = λ n

10 Spectral decomposition Let y i to denote a transformed variable, where Y = (y 1,...,y n ) and Y = XQ. That is, ( ) ( ) y1 y 2 y 3 = x1 x 2 x or y 1 = 0.38x x 2 y 2 = x 3 y 3 = 0.92x x 2

11 Spectral decomposition The variance-covariance matrix of y is given by: Var(Y) = Var(XQ) = Q T Var(X)Q (where Var(X) = S) = Q T SQ λ 1 λ 2 = Λ =.... λ n

12 Verification Note that Var(aX +by) = a 2 Var(X)+b 2 Var(Y)+2abCov(X,Y). Var(y 1 ) = Var( 0.38x x 2 ) = ( 0.38) 2 Var(x 1 )+(0.92) 2 Var(x 2 )+2( 0.38)(0.92)Cov(x 1,x 2 ) = 0.15(1)+0.85(5) 0.7( 2) = 5.8 = λ 1, Var(y 2 ) = Var(x 3 ) = 2 = λ 2, Var(y 3 ) = Var(0.92x x 2 ) = Var(x 1 ) Var(x 2 ) Cov(x 1,x 2 ) = ( 2) = 0.17 = λ 3.

13 Verification Cov(y 1,y 2 ) =Cov( 0.38x x 2, x 3 ) = 0.38Cov(x 1,x 3 )+0.92Cov(x 2,x 3 ) = 0.38(0)+0.92(0) = 0. Cov(y 2,y 3 ) =Cov(x 3,0.92x x 2 ) =0.92Cov(x 3,x 1 )+0.38Cov(x 3,x 2 ) = = 0. Cov(y 1,y 3 ) =Cov( 0.38x x 2, 0.92x x 2 ) = Cov(x 1,x 1 ) Cov(x 1,x 2 ) Cov(x 2,x 1 ) Cov(x 2,x 2 ) = = 0

14 PCA y 1,...,y n are the principal components of x 1,...,x n. y i has variance λ i. y i is uncorrelated with y j (i j), since Λ, i.e. the covariance of Y, is diagonal. From λ 1 λ 2 λ n it follows that y 1 has the largest variance λ 1, y 2 has the second largest variance λ 2, and so on. λ 1 = 5.83, λ 2 = 2 λ 3 = 0.17.

15 PCA The total variation in the original data is the sum of the variances of the original variables x 1,...,x n. That is, Notice that: trace(s) = trace(q T ΛQ) S 11 +S S nn = trace(s). = trace(λqq T ) (since trace(ab) = trace(ba)) = trace(λ) (since QQ T = I n.) = λ 1 +λ 2 + +λ n. which is the sum of variances of the n PCs y 1,...,y n. S 11 +S S nn = 8 = λ 1 +λ 2 + +λ n.

16 PCA Thus, the total variation is exactly the same on the complete set of principal components as on the original variables. So no information is lost it is just rearranged into order. The sum of the variances n i=1 λ i of the n PCs y 1,...,y n is the same as the sum of the variances n i=1 S ii of the original variables x 1,...,x n. The components with smaller variances could be ignored without significantly effective the total variance, thereby reducing the number of variables from n. The total variation of the PCs is n i=1 λ i = n i=1 S ii = trace(s). So we can interpret λ 1 / n i=1 λ i as the proportion of the total variation explained by the first principal component. In the example λ 1 / λ i = 5.83/8 = 0.73.

17 Proportion of variation The proportion of the total variation explained by the first two PCs is given by (λ 1 +λ 2 )/ n i=1 λ i. (λ 1 +λ 2 )/ λ i = (5.83+2)/8 = In general the proportion of the total variation explained by the first k PCs is given by k i=1 λ i. n λ i i=1

18 Proportion of variation If the first few PCs explain most of the variation in the data, then the later PCs are redundant and little information is lost if they are discarded (or ignored). E.g. if k i=1 λ i n i=1 λ i = say 80+%, then the (k +1)th,...,nth components contain relatively little information and the dimensionality of the data can be reduced from n to k with little loss of information. Useful if k = 1,2,3,4?,5??? The figure of 80% is quite arbitrary and depends really on the type of data being analyzed particularly on the applications area. Some areas might conventionally be satisfied if 40% of the variation can be explained in a few PCs, others might require 90%.

19 Scree-plot A figure (percentage) needs to be chosen as a trade-off between the convenience of a small value of k and a large value of the cumulative relative proportion of variance explained. If n is large an informal way of choosing a good k is graphically with a scree-plot. That is, plot vs k. k i=1 λ i n i=1 λ i The graph (scree-plot) will be necessarily monotonic and convex. Typically it will increase steeply for the first few values of k (i.e. the first few PCs) and then begin to level off. The point where it starts leveling off is the point where bringing in more PCs brings less returns in terms of variance explained.

20 Scree-plot k i=1 n i=1 λ i λ i kink or elbow graph suddenly flattens, take k = n k

21 Analytical Approach Given n variables the objective of the principal components is to form n linear combinations: y 1 = w 11 x 1 +w 12 x 2 + +w 1n x n y 2 = w 21 x 1 +w 22 x 2 + +w 2n x n. y n = w n1 x 1 +w n2 x 2 + +w nn x n.

22 Analytical Approach Here y 1,y 2,...y n are the n principal components. w ij is the weight of the jth variable for the ith principal component. Var(y 1 ) > Var(y 2 ) > > Var(y n ). n k=1 w2 ik = w2 i1 + +w2 in = 1. n k=1 w ikw jk = w i1 w j1 + +w in w jn = 0. Example: y 1 = 0.728x x 2 and y 2 = 0.685x x = 1 and ( 0.685) = 0.

23 Scores and Loadings The principal component scores are the values (output) of the principal component variables. The loading is the simple correlation between the original and the new (principal component) variables. This is an indication of the extent to which the original variables are influential or important in forming the principal components. That is, the higher the loading the more influential the variable is in forming the principal component scores and vice-versa.

24 Scores and Loadings The loadings can be obtained from the relationship: where l ij = w i,j ˆσ j λi, 1. l ij is the loading of the jth variable to the ith principal component. 2. w i,j is the weight of jth variable to the ith principal component. 3. ˆσ j is the standard deviation of the jth variable. 4. λ i is the eigenvalue (i.e. variance) of the ith principal component.

25 Example: Financial ratios X 1 and X 2 The table presents a small data set consisting of 12 observations and 2 variables X 1 and X 2 (financial ratios). The table also gives the mean corrected data (denoted by X1 and X2), the SSCP, the covariance matrix S and the correlation matrix R. The mean corrected variables are transformed (rotated) using the orthogonal matrix (rotation). The rotation that took place is: ( P1 P 2 ) = ( X 1 X 2) ( cos(θ) sin(θ) sin(θ) cos(θ) ). E.g. if θ = , then cos(θ) = and sin(θ) = For example, ( ) ( ) ( ) =

26 Example: Financial ratios X 1 and X 2 Original Mean Rotated Corrected by o X 1 X 2 X1 X2 P 1 P Mean: Var:

27 Example: Financial ratios X 1 and X 2 Original Variables: ( ) SSCP = ( ) S = ( ) R = New Variables: ( ) SSCP = ( ) S = ( ) R =

28 Example: Financial ratios X 1 and X 2 The variance accounted for by the new variables P 1 and P 2 for various angles (rotations). Rotate by Total Variance Percent Angle θ Variance for P 1 %

29 Example: Financial ratios X 1 and X 2 Percent of total variance accounted for by P_1 percentage of variance Angle theta for rotation

30 Computer Output: Financial ratios X 1 and X 2 Simple Statistics X 1 X 2 Mean St. Dev Importance of Components: PC1 PC2 Standard deviation Proportion of Variance Cumulative Proportion Covariances X 1 X 2 X X Correlation Coefficients: X 1 X 2 PC1 PC2 X X PC e-17 PC e Total Variance = Eigenvectors: PC1 PC2 X X

31 Computer Output: Financial ratios X 1 and X 2 Scores: X 1 X 2 PC1 PC Simple Statistics: Mean St.Dev X X PC1 7.4e PC2-1.5e

32 Summary The total variances of X 1 and X 2 is (i.e ). The variables X 1 and X 2 have correlation coefficient The percentage of the total variance accounted for by X 1 and X 2 are given respectively, by 52.26% and 47.74%. Each of the new variables (i.e. principal components P 1 and P 2 ) are linear combinations of the original variables and remain mean corrected. That is their means are zero. The total SS (Sum of Squares) for P 1 and P 2 is the same as the total SS for the original variables ( = 486). The variances of P 1 and P 2 are, respectively, and The total variance of the principal components is and is the same as the total of the variance of the original variables X 1 and X 2.

33 Summary The percentage of the total variance accounted for by P 1 and P 2 are, respectively, 87.31% (= 38.58/44.18) and 12.69% (= 5.61/44.18). The variance accounted for by the first principal component P 1 is greater than the variance accounted for by any one of the original variables. The second principal component P 2 accounts for variance that has not been accounted for by P 1. The two principal components accounts for all the variance in the data. The correlation between the principal components is zero, that is, P 1 and P 2 are uncorrelated.

34 Effect of type of data on PCA Consider the data below which shows the Estimated Retail Prices by Cities, March 1973, U.S. Department of Labour, Bureau of Labor Statistics, pp1-8. Average Price (in cents per pound) City Bread Burger Milk Oranges Tomatoes Atlanta Baltimore Boston Buffalo Chicago Cincinnati Cleveland Dallas Detroit Honolulu Houston Kansas City Los Angeles Milwaukee Minneapolis New York Philadelphia Pittsburgh St. Louis San Diego San Francisco Seattle Washington DC Mean Variance % of total Variance: Total Variance:

35 Effect of type of data on PCA The objective is to form a measure of the Consumer Price Index (CPI). That is, we would like to form a weighted sum of the various food prices that would summarize how expensive or cheap are a given city s food items. PCA would be an appropriate technique for developing such an index. Principal components analysis can be either done on mean-corrected or standardized data. Each data set could give a different solution depending upon the extend to which the variances of the variables differ. That is, the variances of the variables could have an effect on PCA.

36 Computer output Simple Statistics Bread Burger Milk Oranges Tomatoes Mean St. Dev Covariance Matrix Bread Burger Milk Oranges Tomatoes Bread Burger Milk Oranges Tomatoes Importance of Components: PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of Variance Cumulative Proportion

37 Computer output Eigenvectors: PC1 PC2 PC3 PC4 PC5 Bread Burger Milk Oranges Tomatoes Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC PC

38 Computer output Scores: PC1 PC2 PC3 PC4 PC5 Baltimore Los Angeles Atlanta Washington DC Seattle Dallas New York Pittsburgh Buffalo Honolulu

39 Summary The first principal component PC1 is given by: PC1 =0.028Bread+0.2Burger+0.042Milk Oranges Tomatoes. The variance of PC1 is and accounts for 58.8% of the total variance of the original data. The PC1 is the sum of all food prices and is very much affected by the price of oranges. Since all the weights of PC1 are positive a high score will imply that the food prices are high and vice-versa. Thus, from the score (values) of PC1 suggests that Honolulu is the most expensive city and Baltimore is the least expensive city.

40 Summary The main reason the price of oranges dominates the formation of PC1 is that there exist a wide variation in the price of oranges across the cities. That is, the variance of the price for oranges is very high compared to the variances of the prices of other food items. In general, the weight assigned to a variable is affected by the relative variance of the variable. If we do not want the relative variance to affect the weights, then the data should be standardized so that the variance of each variable is the same (i.e. one).

41 PCA after standardizing the data Correlation Matrix Bread Burger Milk Oranges Tomatoes Bread Burger Milk Oranges Tomatoes Importance of Components: PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of Variance Cumulative Proportion

42 PCA after standardizing the data Eigenvectors: PC1 PC2 PC3 PC4 PC5 Bread Burger Milk Oranges Tomatoes Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC

43 PCA after standardizing the data Scores: PC1 PC2 PC3 PC4 PC5 Seattle San Diego Houston Cleveland Los Angeles Pittsburgh Philadelphia Boston New York Honolulu

44 Summary Since the data are standardized, the variance of each variable is one and each variable accounts for 20% of the total variance. The first principal component accounts for 48.84% (= /5) of the total variance and is given by: PC1 =0.496Bread+0.576Burger+0.340Milk Oranges Tomatoes. The second principal component accounts for 22.1% (= /5) of the total variance and is given by: PC2 = 0.309Bread 0.044Burger 0.431Milk Oranges Tomatoes.

45 Summary The PC1 is a weighted sum of all the food prices and no one food item dominates the formation of the score. The value of PC1 suggests that Honolulu is the most expensive city and Seattle is now the least expensive city, as compared to Baltimore when the data were not standardized. Therefore, the weights that are used to form the CPI are affected by the relative variances of the variables. The decision of how many principal components should retain is dependent on how much information (i.e. uncounted variance) one is willing to sacrifice, which is a judgment question.

46 Summary Two alternatives are: Use the scree plot and look for an elbow. The rule can be used for both mean-corrected and standardized data. In the case of standardized data retain only those components whose eigenvalues (variance) are greater than one. this is referred to as the eigenvalue-greater-than-one rule. This rule is the default option in most of the statistical packages (SAA and SPSS). The rationale for this rule is that for standardize data the amount of variance extracted by each component should at minimum be equal to the variance of at least one variable.

47 Variance plot PCs Variances Comp.1 Comp.2 Comp.3 Comp.4 Comp.5

48 Scree Plot

49 Cumulative Scree Plot

50 Interpreting Principal Components Since the PCs are linear combinations of the original variables, it is often necessary to interpret or provide a meaning to the linear combinations. As mentioned earlier, one can use loadings for interpreting the PCs. Consider the loadings for the first 2 PCs when standardize data are used: Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC

51 Interpreting Principal Components The higher the loading of a variable, the more influence it has in the formation of the principal component score and vice-versa. Therefore, one can use the loading to determine which variables are influential in the formation of PCs, and one can then assign a meaning or label to the PC. How high should the loading be before we can say that a given variable is influential in the formation of a PC score?. Unfortunately there is no guidelines to help us in establishing how high is high. Traditionally, researchers have used a loading of 0.5 or above as the cutoff point.

52 Interpreting Principal Components If we use 0.5 as the cutoff value, then the loadings for the first 2 PCs are: Correlation Coefficients: Bread Burger Milk Oranges Tomatoes PC PC It can be see that the first PC represents the price index for non-fruit items, while the second PC represents the price of the fruit item (i.e. oranges). That is, the PC1 is a measure of the prices of bread, hamburger, milk and tomatoes across the cities, while the PC2 is the measure of the prices of oranges across the cities. We can label PC1 as the CPI of non-fruit items and PC2 as the CPI of fruit items.

53 Biplot PC Oranges Milwaukee Cleveland Detroit Buffalo Minneapolis Dallas Pittsburgh St._Louis Houston Chicago Philadelphia Boston San_Francisco Seattle Cincinnati Kansas_City San_Diego Los_Angeles Washington_DC Milk Baltimore Atlanta Tomatoes Honolul Burger New_York Bread PC1

2015 NFL Annual Selection Meeting R P O CLUB PLAYER POS COLLEGE ROUND 2

2015 NFL Annual Selection Meeting R P O CLUB PLAYER POS COLLEGE ROUND 2 ROUND 2 2 1 33 TENNESSEE 2 2 34 TAMPA BAY 2 3 35 OAKLAND 2 4 36 JACKSONVILLE 2 5 37 NEW YORK JETS 2 6 38 WASHINGTON 2 7 39 CHICAGO 2 8 40 NEW YORK GIANTS 2 9 41 ST. LOUIS 2 10 42 ATLANTA 2 11 43 CLEVELAND

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

More information

FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Exploratory Factor Analysis

Exploratory Factor Analysis Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

Trade Show Labor Rate Benchmarking Survey

Trade Show Labor Rate Benchmarking Survey 2011 Trade Show Labor Rate Benchmarking Survey EXHIBITOR COSTS IN 41 U.S. CITIES labor drayage audio visual exhibitor services eventmarketing.com eventmarketing.com Produced by in association with eventmarketing.com

More information

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue. More Principal Components Summary Principal Components (PCs) are associated with the eigenvectors of either the covariance or correlation matrix of the data. The ith principal component (PC) is the line

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

ASSESSING RISK OF SENIOR LIVING OVER-SUPPLY A LONG-TERM PERSPECTIVE

ASSESSING RISK OF SENIOR LIVING OVER-SUPPLY A LONG-TERM PERSPECTIVE TOPICS ASSESSING RISK OF SENIOR LIVING OVER-SUPPLY A LONG-TERM PERSPECTIVE The ratio of new openings to existing inventory ratio (the new openings ratio ) in combination with the ratio of units currently

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

Statistics for Business Decision Making

Statistics for Business Decision Making Statistics for Business Decision Making Faculty of Economics University of Siena 1 / 62 You should be able to: ˆ Summarize and uncover any patterns in a set of multivariate data using the (FM) ˆ Apply

More information

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables. FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method. Factor Analysis in SPSS To conduct a Factor Analysis, start from the Analyze menu. This procedure is intended to reduce the complexity in a set of data, so we choose Data Reduction from the menu. And the

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Factor Rotations in Factor Analyses.

Factor Rotations in Factor Analyses. Factor Rotations in Factor Analyses. Hervé Abdi 1 The University of Texas at Dallas Introduction The different methods of factor analysis first extract a set a factors from a data set. These factors are

More information

The Strategic Assessment of the St. Louis Region

The Strategic Assessment of the St. Louis Region The Strategic Assessment of the St. Louis Region 7th Edition, 2015 WHERE The 7th Edition of Where We Stand (WWS) presents 222 rankings comparing St. Louis to the 50 most populated metropolitan areas in

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as: 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables--factors are linear constructions of the set of variables; the critical source

More information

Common factor analysis

Common factor analysis Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) 3. Univariate and multivariate

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

SACKS-CHAMPIONSHIPS-GAMES WON ALL-TIME RECORDS TEAM RECORDS

SACKS-CHAMPIONSHIPS-GAMES WON ALL-TIME RECORDS TEAM RECORDS SACKS-CHAMPIONSHIPS-GAMES WON TEAM RECORDS CHAMPIONSHIPS Most Seasons League Champion 13 Green Bay, 1929-1931, 1936, 1939, 1944, 1961-62, 1965-67, 1996, 2010 9 Chi. Bears, 1921, 1932-33, 1940-41, 1943,

More information

Adjusting Compensation for Geographical Differences

Adjusting Compensation for Geographical Differences Adjusting Compensation for Geographical Differences Dan A. Black Harris School University of Chicago Three parts of the talk First, I will give you some background about prices across geography and how

More information

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Exhibition & Event Industry Labor Rates Survey

Exhibition & Event Industry Labor Rates Survey Exhibition & Event Industry Labor Rates Survey Event Labor and Material Handling Cost Averages in Over 40 North American Cities Special Report Produced by RESEARCH & CONSULTING Font: Ocean Sans MM 648

More information

Factor Analysis. Sample StatFolio: factor analysis.sgp

Factor Analysis. Sample StatFolio: factor analysis.sgp STATGRAPHICS Rev. 1/10/005 Factor Analysis Summary The Factor Analysis procedure is designed to extract m common factors from a set of p quantitative variables X. In many situations, a small number of

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

THE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE

THE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE THE USING FACTOR ANALYSIS METHOD IN PREDICTION OF BUSINESS FAILURE Mary Violeta Petrescu Ph. D University of Craiova Faculty of Economics and Business Administration Craiova, Romania Abstract: : After

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #11-8/4/2011 Slide 1 of 39 Today s Lecture Canonical Correlation Analysis

More information

Synopsis: In the first September TripCase product release there will be several big updates.

Synopsis: In the first September TripCase product release there will be several big updates. TripCase Document Delivery 15.09 Implementation: 10 nd Sep2015 SIN time Synopsis: In the first September TripCase product release there will be several big updates. 1) Frontline agents will have access

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS 1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2

More information

Factor Analysis Using SPSS

Factor Analysis Using SPSS Psychology 305 p. 1 Factor Analysis Using SPSS Overview For this computer assignment, you will conduct a series of principal factor analyses to examine the factor structure of a new instrument developed

More information

Zurich Staff Legal. Experienced. Collaborative. Focused on Results.

Zurich Staff Legal. Experienced. Collaborative. Focused on Results. Zurich Staff Legal Experienced. Collaborative. Focused on Results. Staff Legal How We Deliver We are located where you need us, with more than 250 attorneys in our national network of 48 offices that cover

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Form LM-3 Common Reporting Errors

Form LM-3 Common Reporting Errors OLMS COMPLIANCE TIP Form LM-3 Common Reporting Errors The Office of Labor-Management Standards (OLMS) enforces certain provisions of the Labor- Management Reporting and Disclosure Act (LMRDA), including

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

RISK ANALYSIS ON THE LEASING MARKET

RISK ANALYSIS ON THE LEASING MARKET The Academy of Economic Studies Master DAFI RISK ANALYSIS ON THE LEASING MARKET Coordinator: Prof.Dr. Radu Radut Student: Carla Biclesanu Bucharest, 2008 Table of contents Introduction Chapter I Financial

More information

TAMPA INTERNATIONAL AIRPORT

TAMPA INTERNATIONAL AIRPORT Albany Southwest Airlines 73W 332 0900 1150 1,130 143...7 Albany Southwest Airlines 73W 1811 0900 1150 1,130 143 12345.. Albany Southwest Airlines 73W 6066 1110 1400 1,130 143...6. Atlanta Delta Air Lines

More information

The Most Affordable Cities For Individuals to Buy Health Insurance

The Most Affordable Cities For Individuals to Buy Health Insurance The Most Affordable Cities For Individuals to Buy Health Insurance Focusing on Health Insurance Solutions for Millions of Americans Copyright 2005, ehealthinsurance. All rights reserved. Introduction:

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

Behavioral Entropy of a Cellular Phone User

Behavioral Entropy of a Cellular Phone User Behavioral Entropy of a Cellular Phone User Santi Phithakkitnukoon 1, Husain Husna, and Ram Dantu 3 1 santi@unt.edu, Department of Comp. Sci. & Eng., University of North Texas hjh36@unt.edu, Department

More information

Houston Economic Outlook. Presented by Patrick Jankowski Vice President, Research

Houston Economic Outlook. Presented by Patrick Jankowski Vice President, Research Houston Economic Outlook Presented by Patrick Jankowski Vice President, Research www.houston.org Follow me on Twitter @pnjankowski Read my blog: wwwhouston.org/economy/blog Connect with me: www.linkedincom/in/pnjankowski

More information

State of Stress at Point

State of Stress at Point State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

4. Matrix Methods for Analysis of Structure in Data Sets:

4. Matrix Methods for Analysis of Structure in Data Sets: ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

THE FEDERAL BUREAU OF INVESTIGATION FINANCIAL INSTITUTION FRAUD AND FAILURE REPORT

THE FEDERAL BUREAU OF INVESTIGATION FINANCIAL INSTITUTION FRAUD AND FAILURE REPORT U.S. Department of Justice Federal Bureau of Investigation FINANCIAL INSTITUTION FRAUD UNIT FINANCIAL CRIMES SECTION FINANCIAL INSTITUTION FRAUD AND FAILURE REPORT FISCAL YEAR 2002 THE FEDERAL BUREAU OF

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Trends. Trends in Office Buildings Operations, 2011

Trends. Trends in Office Buildings Operations, 2011 Trends Trends in Office Buildings Operations, 2011 THE SAMPLE This 2012 edition represents 2011 data collection from nearly 2,700 private-sector buildings across the United States and Canada. This year

More information

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Exploratory Factor Analysis

Exploratory Factor Analysis Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than

More information

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models

Exploratory Factor Analysis: rotation. Psychology 588: Covariance structure and factor models Exploratory Factor Analysis: rotation Psychology 588: Covariance structure and factor models Rotational indeterminacy Given an initial (orthogonal) solution (i.e., Φ = I), there exist infinite pairs of

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014 Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide

More information

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

More information

Atlanta Rankings 2014

Atlanta Rankings 2014 Atlanta Rankings Major National Magazine and Study Rankings BUSINESS FACILITIES Metro Business Rankings Lowest Cost of Doing Business 2. Orlando, FL 3. Charlotte, NC 4. San Antonio, TX 5. Tampa, FL 6.

More information

Zillow Negative Equity Report

Zillow Negative Equity Report Overview The housing market is finally showing signs of life, with many metropolitan areas having hit the elusive bottom and seeing home value appreciation, however negative equity remains a drag on the

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

10. Comparing Means Using Repeated Measures ANOVA

10. Comparing Means Using Repeated Measures ANOVA 10. Comparing Means Using Repeated Measures ANOVA Objectives Calculate repeated measures ANOVAs Calculate effect size Conduct multiple comparisons Graphically illustrate mean differences Repeated measures

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

provides it s customers with a single, reliable source for their wiring construction needs.

provides it s customers with a single, reliable source for their wiring construction needs. provides it s customers with a single, reliable source for their wiring construction needs. Focused on Telephone, Cable Television, Data, Power and Security Wiring, C3 provides the construction management

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

A Brief Introduction to Factor Analysis

A Brief Introduction to Factor Analysis 1. Introduction A Brief Introduction to Factor Analysis Factor analysis attempts to represent a set of observed variables X 1, X 2. X n in terms of a number of 'common' factors plus a factor which is unique

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

More information

Psychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use

Psychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use : Suggestions on Use Background: Factor analysis requires several arbitrary decisions. The choices you make are the options that you must insert in the following SAS statements: PROC FACTOR METHOD=????

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information