# Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Save this PDF as:

Size: px
Start display at page:

Download "Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression"

## Transcription

1 Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate analysis, it is especially critical for multivariate regressions in many P&C insurance-related applications. In this paper, we ll present two methodologies, principle component analysis (PCA) and partial least squares (PLC), for dimension reduction in a case that the independent variables used in a regression are highly correlated. PCA, as a dimension reduction methodology, is applied without the consideration of the correlation between the dependent variable and the independent variables, while PLS is applied based on the correlation. Therefore, we call PCA as an unsupervised dimension reduction methodology, and call PLS as a supervised dimension reduction methodology. We ll describe the algorithms of PCA and PLS, and compare their performances in multivariate regressions using simulated data. Key Words: PCA, PLS, SAS, GLM, Regression, Variance-Covariance Matrix, Jordan Decomposition, Eigen Value, Eigen Factors. Introduction In large-scale data mining and predictive modeling, especially for multivariate regression exercises, we often start with a large number of possible explanatory/predictive variables. Therefore, variable selection and dimension reduction is a major task for multivariate statistical analysis, especially for multivariate regressions. A well-known method in regression analysis for dimension reduction is called stepwise regression algorithm, which is covered by many statistical softwares such as SAS and SPSS. One of the major limitations of the algorithm is that when several of the predictive variables are highly correlated, the tests of statistical significance that the stepwise method is based on are not sound, as independence is one of the primary assumptions of these tests. Often, many variables used as independent variables in a regression display a high degree of correlation, because those variables might be measuring the same characteristics. For example, demographic variables measuring population density characteristics or weather characteristics are often highly correlated. A high degree of correlation among the predictive variables increases the variance in estimates of the regression parameters. This problem is known as multi-colinearity in regression literature (Kleinbaum et al. [4]). The parameter estimates in a regression equation may change with a slight change in data and hence are not stable for predicting the future. In this paper, we will describe two methodologies, principle component analysis (PCA) and Casualty Actuarial Society, 2008 Discussion Paper Program 79

2 partial least square (PLS), for dimension reduction in regression analysis when some of the independent variables are correlated. We ll describe what algorithm is used in each methodology and what the major differences are between the two methodologies. Principal Component Analysis PCA is a traditional multivariate statistical method commonly used to reduce the number of predictive variables and solve the multi-colinearity problem (Bair et al. [3]). Principal component analysis looks for a few linear combinations of the variables that can be used to summarize the data without losing too much information in the process. This method of dimension reduction is also known as parsimonious summarization (Rosipal and Krämer [6]) of the data. We will now formally define and describe principal components and how they can be derived. In the process we introduce a few terms for the sake of completeness. Data Matrix Let Xn p denote the matrix of predictive variables (henceforth referred to as data-matrix), where each row denotes an observation on p different predictive variables, X1,X2,, Xp. We will denote a random observation from this matrix by x1 p. The problem at hand is to select a subset of the above columns that holds most of the information. Variance-Covariance Matrix Let σ ij denote the co-variance between X i and X j in the above data-matrix. We will denote the matrix of ((σ ij )) by. Note that the diagonal elements of are the variances of X i. In actual calculations σ ij s may be estimated by their sample counterpart s s ij or sample covariance calculated from the data. The matrix of standard deviations ((s ij )) will be denoted by S. Note both and S are p p square and symmetric matrices. Linear Combination A linear combination of a set of vectors (X1,X2,, Xp) is an expression of the type αixi (i=1 to p) and αis are scalars. A linear combination is said to be normalized or standardized if αi =1 (sum of absolute values). In the rest of the article, we will refer to the standardized linear combination as SLC. Linear Independence A set of vectors are said to be linearly independent if none of them can be written as a linear Casualty Actuarial Society, 2008 Discussion Paper Program 80

3 combination of any other vectors in the set. In other words, a set of vectors (X 1,X 2,, X p ) is linearly independent if the expression α i X i = 0 α i = 0 for all values of i. A set of vectors not linearly independent is said to be linearly dependent. Statistically, correlation is a measure of linear dependence among variables and presence of highly correlated variables indicate a linear dependence among the variables. Rank of a Matrix Rank of a matrix denotes the maximum number of linearly independent rows or columns of a matrix. As our data-matrix will contain many correlated variables that we seek to reduce, rank of the data-matrix, X n p, is less than or equal to p. Jordan Decomposition of a Matrix Jordan decomposition or spectral decomposition of a symmetric matrix is formally defined as follows. Any symmetric matrix A p p can be written as A=ГΛГ T = λ i γ (i)γ (i) where Λ p p is a diagonal matrix with all elements 0 except the diagonal elements and Г p p is an orthonormal matrix, i.e., Г Г =I (identity matrix). The diagonal elements of Λ are denoted by λ i (i=1 to p) and the columns of Г are denoted by γ (i) (i=1 to p). In matrix algebra, λ i s are called eigen values of A and γ (i) s are the corresponding eigen vectors. If A is not a full rank matrix, i.e., rank(a) = r < p, then there are only r non-zero eigen values in the above Jordan decomposition, with the rest of the eigen values being equal to 0. Principal Components In principal component analysis, we try to arrive at a suitable SLC of the data-matrix X based on the Jordan decomposition of the variance-covariance matrix of X or equivalently based on the correlation matrix Φ of X. We denote the mean of the observations as μ 1 p. Let x 1 p =(x 1,x 2,,x p ) denote a random vector observation in the data-matrix (i.e., transpose of any row of the n p data matrix), with mean μ 1 p and covariance matrix. A principal component is a transformation of the form x 1 p y 1 p = (x-μ) 1 p Г p p, where Г is obtained from the Jordan decomposition of, i.e., Г T Г = Λ = diag(λ 1, λ 2,, λ p ) with λ i s being the eigen values of the decomposition. Each element of y 1 p is a linear combination of the elements of x 1 p. Also each Casualty Actuarial Society, 2008 Discussion Paper Program 81

4 element of y is independent of the other. Thus we obtain p independent principal components corresponding to the p eigen values of the Jordan decomposition of. Generally, we will only use the first few of these principal components for a regression. In the next section, we will list the major properties of the principal components as obtained above. This will help us to understand why the first few of the principal components may hold the majority of the information and thus help us reduce the dimension in a regression without losing too much information. Properties of Principal Components (Anderson [2]) The following result justifies the use of PCA as a valid variable reduction technique in regression problems, where a first few of the principal components are used as predictive variables. Let x be a random p dimensional vector with mean μ and covariance matrix. Let y be the vector of principal components as defined above. Then the following holds true. (i) E(y i ) = 0 (ii) Var(y i ) = λ i (iii) Cov(y i,y j ) = 0 (iv) Var(y 1 ) Var(y 2 ) Var(y p ) (v) No SLC of x has variance larger than λ 1, the variance of the first principal component. (vi) If z= α i x i be a SLC of x, which is uncorrelated with first k principal components, then variance of z is maximized if z equals the (k+1) th principal component. Item (iii) above justifies why using principal components instead of the raw predictive variables will remove the problem of multi-colinearity. Items (iv), (v), and (vi) indicate that principal components successively capture the maximum of the variance of x and that there is no SLC that can capture maximum variance without being one of the principal components. When there is high degree of correlation among the original predictive variables, only the first few of the principal components are likely to capture majority of the variance of the original predictive variables. The magnitude of λ i s provides the measure of variance captured by the principal components and should be used to select the first few components for a regression. Casualty Actuarial Society, 2008 Discussion Paper Program 82

5 A Numerical Example of PCA In this section we describe the process of building principle components in a multivariate regression set up using a simulated data for line of business of business owners policies (BOP). The simulated data has been used for the 2006 and 2007 CAS Limited Attendance Predictive Modeling Seminars. Description of the Data The simulated data is in a policy-year level. That means each data record contains information of a twelve-month BOP policy. In the data, we simulated claim frequency, claim count over per \$000 premium, and six correlated policy variables. The policy variables in this example are: fireprot Fire Protection Class numbldg Number of Building in Policy numloc Number of Locations in Policy bldgage Maximum Building Age bldgcontents Building Coverage Indicator polage Policy Age All the predictive variables are treated as continuous variables including the bldgcontents variable. Both the multivariate techniques described in this paper works only with continuous and ordinal variables. Categorical variables cannot be directly analyzed by these methods for variable reduction. The correlation matrix of the above predictive variables is: fireprot numbldg numloc bldgage bldgcontents polage fireprot numbldg numloc bldgage bldgcontents polage Casualty Actuarial Society, 2008 Discussion Paper Program 83

6 As can be seen, numbldg and numloc are highly correlated and the variable fireprot has significant correlation with two other variables. Principle Components Regression As described earlier, the principle components are obtained by eigen-value decomposition of the covariance or correlation matrix of the predictive variables under consideration. Generally, most statistical software can compute the principle components once we specify the data set and the variables with which we want to construct principle components. SAS, for example, provides outputs of the linear coefficients (eigen vectors) along with mean and standard deviations of each predictive variables. These can be used to compute the principle components on the data set for regression. Step 1: Compute the Jordan decomposition of the correlation matrix and obtain the eigen vector (Гi={γ1i, γ2i,, γpi}) corresponding to each eigen value (λi). The six eigen values of the eigen value decomposition of the above correlation matrix are as follows: Eigen Values Proportion of Total Cumulative Proportion of Total As we can see the first four eigen values capture about 90% of the information in the correlation matrix. The eigen vectors (columns of matrix Г in the Jordan decomposition) corresponding to each of the eigen values above are: Eigen Vector 1 Eigen Vector 2 Eigen Vector 3 Eigen Vector 4 Eigen Vector 5 Eigen Vector 6 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Casualty Actuarial Society, 2008 Discussion Paper Program 84

7 Step 2: Construct the principle components corresponding to each eigen value by linearly combining the standardized predictive variables using the corresponding eigen vector. Hence the first principle component can be computed as: PrinComp1 = * ( fireprot ) / * ( numbldg ) / * ( numloc ) / * ( bldgage ) / * ( bldgcontents ) / * ( polage ) / Note that each variable is standardized while computing the principal components. Now, we ll use the principle components we constructed above in a generalized linear model (GLM) type of regression. There are lot of papers and presentations on GLM ([1], [5]), and we will not spend effort here to describe the related concepts and details. The only two characteristics of GLM that we like to mention are error distribution and link function. Unlike the traditional ordinary regressions, a GLM can select any distribution within the exponential family as the model for the distribution of the target variable. GLM also allows us to use a non-linear link function that permits us to incorporate a non-linear relationship between the target variables and the predictive variables. For example, while fitting a severity curve often the LOG of the loss value can be modeled more easily than the actual loss value in a linear model. GLM allows us to accomplish this by specifying a LOG as the link function. However, it is to be noted that GLM is still linear in terms of the regression parameters. In this numerical example for PCA, we choose Poisson distribution for regression error and choose IDENTITY as a link function. We used the claim frequency, claim count over \$000 premium, as the dependent variable and used the principle components constructed above as independent variables. The summary of the regression is displayed below: Casualty Actuarial Society, 2008 Discussion Paper Program 85

8 The P-values and chi-square-statistics demonstrate that the first three principle components explained about 75% the predictive power of the original six policy variables. But, we also noticed the rank of the predictive power didn t line up with the order of the principle components. For example, the first principle component is less explanatory for the target than the second, even though the first principle component contains more information on the six original policy variables. In the next section, we ll describe another dimension reduction technique, partial least squares (PLS), which can be used to solve the problem. PARTIAL LEAST SQUARES In the last section we discussed applying PCA in regression as a dimension reduction technique as well as using it to deal with multi-colinearity problems. One drawback of PCA technique in its original form is that it arrives at SLCs that capture only the characteristics of the X-vector or predictive variables. No importance is given to how each predictive variable may be related to the dependent or the target variable. In a way it is an unsupervised dimension reduction technique. When our key area of application is multivariate regression, there may be considerable improvement if we build SLCs of predictive variables to capture as much information in the raw predictive variables as well as in the relation between the predictive and target variables. Partial least square (PLS) allows us to achieve this balance and provide an alternate approach to PCA technique. Partial least squares have been very popular in areas like chemical engineering, where predictive variables often consist of many different measurements in an experiment and the relationships between these variables are ill-understood (Kleinbaum et al. [4]). These measurements often are related to a few underlying latent factors that remain unobserved. In this section, we will describe PLS technique and discuss how it can be applied in regression problems by demonstrating it on our sample data. Description of the Technique Assume X is a n p matrix and Y is a n q matrix. The PLS technique works by successively extracting factors from both X and Y such that covariance between the extracted factors is maximized. PLS method can work with multivariate response variables (i.e., when Y is an n q vector with q>1). However, for our purpose we will assume thatwe have a single response (target) variable i.e., Y is n 1 and X is n p, as before. PLS technique tries to find a linear decomposition of X and Y such that X =TP T + E and Y=UQ T + F, where T n r = X-scores U n r = Y-scores Casualty Actuarial Society, 2008 Discussion Paper Program 86

9 P p r= X-loadings Q 1 r = Y-loadings E n p = X-residuals F n 1 = Y-residuals (1) Decomposition is finalized so as to maximize covariance between T and U. There are multiple algorithms available to solve the PLS problem. However, all algorithms follow an iterative process to extract the X-scores and Y-scores. The factors or scores for X and Y are extracted successively and the number of factors extracted (r) depends on the rank of X and Y. In our case, Y is a vector and all possible X factors will be extracted. Eigen Value Decomposition Algorithm Each extracted x-score are linear combinations of X. For example, the first extracted x-score t of X is of the form t=xw, where w is the eigen vector corresponding to the first eigen value of X T YY T X. Similarly the first y-score is u=yc, where c is the eigen vector corresponding to the first eigen value of Y T XX T Y. Note that X T Y denotes the covariance of X and Y. Once the first factors have been extracted we deflate the original values of X and Y as, X 1 =X tt T X and Y 1 =Y- tt T Y. (2) The above process is now repeated to extract the second PLS factors. The process continues until we have extracted all possible latent factors t and u, i.e., when X is reduced to a null matrix. The number of latent factors extracted depends on the rank of X. A NUMERICAL EXAMPLE FOR PLS In this section we will illustrate how to use the PLS technique to obtain X-scores that will then be used in regression. The data we used for this numerical example is the same as we used for the last numerical example of PCA. The target variable and all the predictive variables used in the last numerical example will be also used in this numerical example. Casualty Actuarial Society, 2008 Discussion Paper Program 87

10 Partial Least Squares As we described in the last section, PLS tries to find a linear decomposition of X and Y such that X=TP T + E and Y=UQ T + F, where T = X-scores P = X-loadings E = X-residuals U = Y-scores Q = Y-loadings F = Y-residuals Decomposition is finalized so as to maximize covariance between T and U. The PLS algorithm works in the same fashion whether Y is single response or multi-response. Note that the PLS algorithm automatically predicts Y using the extracted Y-scores (U). However, our aim here is just to obtain the X-scores (T) from the PLS decomposition and use them separately for a regression to predict Y. This provides us the flexibility to use PLS to extract orthogonal factors from X while not restricting ourselves to the original model of PLS. Unlike PCA factors, PLS factors have multiple algorithms available to extract them. These algorithms are all based on iterative calculations. If we use the eigen value decomposition algorithm discussed earlier, the first step is to compute the covariance X T Y. The covariance between the six predictive variables and the target variable are: 2, , , , , (2,001.69) As noted, the first PLS factor can be computed from the eigen value decomposition of the matrix X T YY T X. The X T YY T X matrix is: 4,878,441 19,965,005 20,977,251 4,591,748 6,314,657 (4,421,174) 19,965, ,067,728 85,849,344 18,791,718 25,842,715 (18,093,644) 20,977,251 85,849,344 90,201,995 19,744,478 27,152,967 (19,011,011) 4,591,748 18,791,718 19,744,478 4,321,904 5,943,562 (4,161,355) 6,314,657 25,842,715 27,152,967 5,943,562 8,173,695 (5,722,771) (4,421,174) (18,093,644) (19,011,011) (4,161,355) (5,722,771) 4,006,769 Casualty Actuarial Society, 2008 Discussion Paper Program 88

11 The first eigen vector of the eigen value decomposition of the above matrix is: { , , , , , }. The first PLS X-scrore is determined by linearly combining the predictive variables using the above values. Xsr1= * ( fireprot ) / * ( numbldg ) / * ( numloc ) / * ( bldgage ) / * ( bldgcontents ) / * ( polage ) / Once the first factor has been extracted, the original X and Y is deflated by an amount (Xscr1*Xscr T ) times the original X and Y values. The eigen value decomposition is then performed on the deflated values, until all factors have been extracted (refer to formula 2). We next perform a GLM using the same claim frequency as the dependent variable and the six PLS components, xscr1 xscr6, as independent variables. Same as we did in the numerical example for PCA, we still choose Poisson distribution for error the term and an IDENTITY link function. The regression statistics are displayed below. Comparing to the ChiSq statistics derived from the GLM using PCA, we can see how each PLS factors are extracted in order of significance and predictive power. Further Comparison of PCA and PLS In this section, we have done a simulation study to compare principal components method against the partial least squares methods as a variable reduction technique in regression. A number Casualty Actuarial Society, 2008 Discussion Paper Program 89

12 of simulated datasets were created by re-sampling from original data. PCA and PLS analysis were performed on these data samples and ChiSq statistics of the extracted PCA factors and PLS factors were compared. The exhibit below shows the results on three such samples. Extracted Factor # ChiSq Statistics for PCA Factors Simulated Sample 1 Simulated Sample 2 Simulated Sample 3 ChiSq Statistics for PLS Factors ChiSq Statistics for PCA Factors ChiSq Statistics for PLS Factors ChiSq Statistics for PCA Factors ChiSq Statistics for PLS Factors We can see from the above table that the chi-squared statistics of the first two PLS factors are always more than the corresponding two PCA factors in capturing more information. Summary PCA and PLS serve two purposes in regression analysis. First, both techniques are used to convert a set of highly correlated variables to a set of independent variables by using linear transformations. Second, both of the techniques are used for variable reductions. When a dependent variable for a regression is specified, the PLS technique is more efficient than the PCA technique for dimension reduction due to the supervised nature of its algorithm. References [1] Anderson, D., et al., A Practitioner s Guide to Generalized Linear Models, CAS Discussion Paper Program (Arlington, Va.: Casualty Actuarial Society, 2004). [2] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd Edition (New York: John Wiley & Sons, 1984). [3] Bair, Eric, Trevor Hastie, Paul Debashis, and Robert Tibshirani, Prediction by Supervised Principal Components, Journal of the American Statistical Association 101, no. 473, 2006, pp (19). [4] Kleinbaum, David G., et al., Applied Regression Analysis and Multivariable Methods, 3rd Edition (Pacific Grove, Ca.: Brooks/Cole Publishing Company, 1998). [5] McCullagh, P. and J.A Nelder, Generalized Linear Models, 2nd Edition (London: Chapman and Hall, 1989). [6] Rosipal, Roman, and Nicole Krämer, Overview and Recent Advances in Partial Least Squares, in Subspace, Latent Structure and Feature Selection, Saunders, C., et al. (eds.) (Heidelberg: Springer-Verlag, 2006) vol. 3940, pp Casualty Actuarial Society, 2008 Discussion Paper Program 90

### Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

### 1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### Multiple regression - Matrices

Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### 1 Introduction to Matrices

1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### An Introduction to Partial Least Squares Regression

An Introduction to Partial Least Squares Regression Randall D. Tobias, SAS Institute Inc., Cary, NC Abstract Partial least squares is a popular method for soft modelling in industrial applications. This

### Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Partial Least Square Regression PLS-Regression

Partial Least Square Regression Hervé Abdi 1 1 Overview PLS regression is a recent technique that generalizes and combines features from principal component analysis and multiple regression. Its goal is

### Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

### Application of discriminant analysis to predict the class of degree for graduating students in a university system

International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

### Similar matrices and Jordan form

Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Using the Singular Value Decomposition

Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

### Recall that two vectors in are perpendicular or orthogonal provided that their dot

Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal

Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### 5.2 Customers Types for Grocery Shopping Scenario

------------------------------------------------------------------------------------------------------- CHAPTER 5: RESULTS AND ANALYSIS -------------------------------------------------------------------------------------------------------

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix.

MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix. Inverse matrix Definition. Let A be an n n matrix. The inverse of A is an n n matrix, denoted

### (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.

Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### How to report the percentage of explained common variance in exploratory factor analysis

UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### A Brief Introduction to SPSS Factor Analysis

A Brief Introduction to SPSS Factor Analysis SPSS has a procedure that conducts exploratory factor analysis. Before launching into a step by step example of how to use this procedure, it is recommended

### Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Applied Linear Algebra I Review page 1

Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

### MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

### by the matrix A results in a vector which is a reflection of the given

Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### T-test & factor analysis

Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

### Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### An Application of Linear Algebra to Image Compression

An Application of Linear Algebra to Image Compression Paul Dostert July 2, 2009 1 / 16 Image Compression There are hundreds of ways to compress images. Some basic ways use singular value decomposition

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

### Understanding and Using Factor Scores: Considerations for the Applied Researcher

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

### The Method of Least Squares

Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this

### GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

### A Comparison of Variable Selection Techniques for Credit Scoring

1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Chapter 3 Linear Least Squares Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

### 2.1: MATRIX OPERATIONS

.: MATRIX OPERATIONS What are diagonal entries and the main diagonal of a matrix? What is a diagonal matrix? When are matrices equal? Scalar Multiplication 45 Matrix Addition Theorem (pg 0) Let A, B, and

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

### Extensions of the Partial Least Squares approach for the analysis of biomolecular interactions. Nina Kirschbaum

Dissertation am Fachbereich Statistik der Universität Dortmund Extensions of the Partial Least Squares approach for the analysis of biomolecular interactions Nina Kirschbaum Erstgutachter: Prof. Dr. W.

### ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

### Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural