Canonical Correlation Analysis

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Canonical Correlation Analysis"

Transcription

1 Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #11-8/4/2011 Slide 1 of 39

2 Today s Lecture Canonical Correlation Analysis Today s Lecture What it is How it works How to do such an analysis Examples of uses of canonical correlations Lecture #11-8/4/2011 Slide 2 of 39

3 Purpose Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation In general, when we have univariate data there are times when we would like to measure the linear relationship between things The simplest case is when we have 2 variables and all we are interested in is measuring their linear relationship. Here we would just use bivariate correlation Another case is in multiple regression when we have several independent variables and one dependent variable. In this case we would use the multiple correlation coefficient (R 2 ) So, it would be nice if we could expand the idea used in these to a situation where we have several y variables and several x variables Lecture #11-8/4/2011 Slide 3 of 39

4 Concept From Webster s Dictionary: canonical: reduced to the simplest or clearest schema possible. Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation What do we mean by basic ideas? In describing canonical correlation, we will start with the basic cases where we only have two variables and build on it until we get to canonical correlations 1. First we will look at the bivariate correlation 2. Then we will see what was done to generalize bivariate correlation to the multiple correlation coefficient 3. Finally, these discussions will lead us right to what happens in canonical correlation analysis Lecture #11-8/4/2011 Slide 4 of 39

5 Bivariate Correlation Begin by thinking of just two variables y and x Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation In this case the correlation describes the extent that one variable relates (can predict) the other That is...the stronger the correlation the more we will know about y by just knowing x No relationship Strong positive relationship Lecture #11-8/4/2011 Slide 5 of 39

6 Multiple Correlation On the other hand, if we have one y and multiple x variables we can no longer look at a simple relationship between the two variables Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation But, we can look at how well the set of x variables can predict the y by just computing the regression line Using the regression line we can compute our predicted ŷ and we can compare it to the y variable. Specifically, we now have only two variables y and ŷ = x b = so we can compute a simple correlation Note: we started with something that was more complicated (many x variables) and changed it in to something that we could compute a simple correlation (between y and ŷ) Lecture #11-8/4/2011 Slide 6 of 39

7 Multiple Correlation Example From Weisberg (1985, p. 240). Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation Property taxes on a house are supposedly dependent on the current market value of the house. Since houses actually sell only rarely, the sale price of each house must be estimated every year when property taxes are set. Regression methods are sometimes used to make up a prediction function. We have data for 27 houses sold in the mid 1970 s in Erie, Pennsylvania: x 1 : Current taxes (local, school, and county) 100 (dollars) x 2 : Number of bathrooms x 3 : Living space 1000 (square feet) x 4 : Age of house (years) y: Actual sale price 1000 (dollars) Lecture #11-8/4/2011 Slide 7 of 39

8 Multiple Correlation Example To compute the multiple correlation of x 1, x 2, x 3, and x 4 with y, first compute the multiple regression for all x variables and y: Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation proc reg data=house; model y=x1-x4; output out=newdata p=yhat; run; Then, take the predicted values given by the model, ŷ and correlate them with y: proc corr data=newdata; var yhat y; run; Lecture #11-8/4/2011 Slide 8 of 39

9 Multiple Correlation Example Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation Lecture #11-8/4/2011 Slide 9 of 39

10 Multiple Correlation Example Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation Above is the multiple correlation between x 1, x 2, x 3, x 4 and y Lecture #11-8/4/2011 Slide 10 of 39

11 Canonical Correlation Canonical correlation seeks to find the correlation between multiple x variables and multiple y variables Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation Now we have several y variables and several x variables so neither of our previous two examples can directly apply, BUT we can take the points from the previous cases and use them for this new case So we could look at how well the set of x variables can predict the set of y variables, but in doing this we still will not be able to compute a simple correlation On the other hand, in the multiple regression we found a linear combination of the variables b x to get a single variable In our case we have two sets of variables so it makes sense that we can define two linear combinations...one for the x variables (b 1 ) and one for the y variables (a 1 ) Lecture #11-8/4/2011 Slide 11 of 39

12 Canonical Correlation Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation In the simple case where we only have a single linear combination for each set of variables we can compute the simple correlation between these two linear combinations The first canonical correlation describes the correlation between these two new variables (b 1x and a 1y) So how do we pick the linear transformations? These linear transformations (b 1 and a 1 ) are picked such that the correlation between these two new variables is maximized Notice that this idea is really no different from what we did in multiple regression This also sounds similar to something we have done in PCA Lecture #11-8/4/2011 Slide 12 of 39

13 Canonical Correlation ONE LAST THING Purpose Concept Bivariate Correlation Multiple Correlation Canonical Correlation Think back to PCA when we said that a single linear combination did not account for all of the information present in a data set... Then we could determine how many linear combinations were needed to capture more information (where the linear combinations were all uncorrelated) We can do the same thing here... We can define more sets of linear combinations (b i and a i, i = 1,...,s where s = min (p, q), p is the number of variables in the group of x and q is the number of variables in y) Each linear combinations maximizes the correlation between the new variables under the constraint that they are uncorrelated with all other previous linear combinations Lecture #11-8/4/2011 Slide 13 of 39

14 To show how to compute canonical correlations, first consider our original covariance matrix from our example: Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. x 1 x 2 x 3 x 4 y x x x x y Lecture #11-8/4/2011 Slide 14 of 39

15 From this matrix, we will define four new sub-matrices, from which we will calculate our correlations: Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. x 1 x 2 x 3 x 4 y x 1 x 2 S xx S xy x 3 x 4 y S xy S yy Lecture #11-8/4/2011 Slide 15 of 39

16 So how do we compute the canonical correlations? Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. To begin, note that we could define the Squared Multiple Correlation R 2 M as which can be rewritten as: R 2 M = S xy S 1 xx S xy S yy R 2 M = S 1 yy S yx S 1 xx S yx For canonical correlations, however, we will focus on the matrix formed by the part of the equation within the (note this was just a scalar when y only has one variable) Lecture #11-8/4/2011 Slide 16 of 39

17 We first compute the square root of the eigenvalues (r 1, r 2,...,r s ) and the eigenvectors (a 1, a 2,...,a s ) of: S 1 yy S yx S 1 xx S xy Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. Then we compute the square root of the eigenvalues (r 1, r 2,...,r s ) and the eigenvectors (b 1, b 2,...,b s ) of: S 1 xx S xy S 1 yy S yx Conveniently, the eigenvalues for both equations are equal (and are between zero and one)! The square root of the eigenvalues represents each successive canonical correlation between the successive pairs of linear combinations From the eigenvectors we have determined the linear transformations for the new linear combinations Lecture #11-8/4/2011 Slide 17 of 39

18 Example #2 To illustrate canonical correlations, consider the following analysis: Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. Three physiological and three exercise variables are measured on 27 middle-aged men in a fitness club The variables collected are: Weight (in pounds - x 1 ) Waist size (in inches - x 2 ) Pulse rate (in beats-per-minute - x 3 ) Number of chin-ups performed (y 1 ) Number of sit-ups performed (y 2 ) Number of jumping-jacks performed (y 3 ) The goal of the analysis is to determine the relationship between the physiological measurements and the exercises Lecture #11-8/4/2011 Slide 18 of 39

19 Example #2 To run a canonical correlation analysis, use the following code: proc cancorr data=fit all vprefix=physiological vname= Physiological Measurements wprefix=exercises wname= Exercises ; var Weight Waist Pulse; with Chins Situps Jumps; run; Lecture #11-8/4/2011 Slide 19 of 39

20 Example #2 Lecture #11-8/4/2011 Slide 20 of 39

21 Example #2 Lecture #11-8/4/2011 Slide 21 of 39

22 Standardized Weights Just like in PCA and Factor Analysis, we are interested in interpreting the weights of the linear combination Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. However, if our variables are in different scales they are difficult to interpret So, we can standardize them, which is the same as computing the canonical correlations and linear combination of the correlation matrix instead of using the the variance/covariance matrix We can also compute the standardize coefficients (c and d) directly: c = diag(s yy ) 1 2 a and d = diag(s xx ) 1 2 b Lecture #11-8/4/2011 Slide 22 of 39

23 Example #2 Lecture #11-8/4/2011 Slide 23 of 39

24 Canonical Corr. Properties 1. Canonical correlations are invariant. Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. This means that, like any correlation, scale changes (such as standardizing) will not change the correlation. However, it will change the eigenvectors The first canonical correlation is the best we can do with associations. Which means it is better than any of the simple correlations or any multiple correlation with the variables under study Lecture #11-8/4/2011 Slide 24 of 39

25 Hypothesis Test for Corr. We begin by testing that at least the first (the largest) correlation is significantly different from zero Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. If we cannot get a significant relationship out of the optimal linear combination of variables this is the same as testing H 0 : Σ xy = 0 or B 1 = 0 This is tested using Wilk s Lambda: Λ 1 = S S yy S xx Or, equivalently (where 2 is the eigenvalue from the matrix term produced from the submatrices of the covariance matrix): s Λ 1 = (1 ri 2 ) i=1 Lecture #11-8/4/2011 Slide 25 of 39

26 The Rest Example #2 Standardized Weights Canonical Corr. Properties Hypothesis Test for Corr. In this case Λ 1 as Λ 1 = s (1 ri 2 ) i=1 which can be compared to Λ α,p,q,n 1 q (or to a Λ α,q,p,n 1 p ) In general we can compute Λ j = s (1 ri 2 ) i=k which can be compared to Λ α,p k+1,q k+1,n k q (or to a Λ α,q,p,n 1 p ) Lecture #11-8/4/2011 Slide 26 of 39

27 Example #2 Lecture #11-8/4/2011 Slide 27 of 39

28 Standardized Correlation of Linear Combination with Variables Rotation Redundancy Because in many ways a canonical correlation analysis is similar to what we discussed in PCA, the interpretation methods are also similar Specifically, we will discuss four methods that are used to interpret the results: 1. Standardized Coefficients 2. Correlation between Canonical Variates (the linear combination) and each variable 3. Rotation 4. Redundancy Analysis Lecture #11-8/4/2011 Slide 28 of 39

29 Standardized Standardized Correlation of Linear Combination with Variables Rotation Redundancy Because the standardized variables are on the same scale they can be directly compared Those variables that are most important to the association are the ones with the largest absolute values (i.e., determine importance) To interpret what the linear combination is capturing we will also consider the sign of each weight Lecture #11-8/4/2011 Slide 29 of 39

30 Correlation of Linear Combination with Variab This was mentioned in PCA and EFA... Standardized Correlation of Linear Combination with Variables Rotation Redundancy That is, we compute our linear combinations and then compute the correlation between the linear combination (canonical variates) with each of the actual variables The correlations are typically called the loadings or structure coefficients As was the case in PCA this ignores the overall multidimensional structure and so it is not a recommend analysis to make interpretations from Lecture #11-8/4/2011 Slide 30 of 39

31 Rotation Standardized Correlation of Linear Combination with Variables Rotation Redundancy We could try rotating the weights of the analysis to provide an interpretable result... For this we begin to rely on the spacial representation of what is going on with the data Every linear combination is projecting our observations on to a different dimension Sometimes these dimensions are difficult to interpret (i.e., based on the sign and magnitude Sometimes we can rotate these dimensions so that the weights are easier to interpret Some are large and some are small Rotations in CCA are not recommended, because we lose the optimal interpretation of the analysis Lecture #11-8/4/2011 Slide 31 of 39

32 Redundancy Another method for interpretation is a redundancy analysis (this, again, is often not liked by statisticians because it only summarizes univariate relationships) Lecture #11-8/4/2011 Slide 32 of 39

33 Redundancy Lecture #11-8/4/2011 Slide 33 of 39

34 Redundancy Lecture #11-8/4/2011 Slide 34 of 39

35 In a study of social support and mental health, measures of the following seven variables were taken on 405 subjects: Total Social Support Family Social Support Friend Social Support Significant Other Social Support Depression Loneliness Stress The researchers were interested in determining the relationship between social support and mental health...how about using a canonical correlation analysis? Lecture #11-8/4/2011 Slide 35 of 39

36 *SAS Example #3; data depress (type=corr); _type_= corr ; input _name_ $ v1-v7; label v1= total social support v2= family social support v3= friend social support v4= significant other social support v5= depression v6= loneliness v7= stress ; datalines; v v v v v v v ; proc cancorr data=depress all corr edf=404 vprefix=mental_health vname= Mental Health wprefix=social_support wname= Social Support ; var v1-v4; with v5-v7; run; 35-1

37 Lecture #11-8/4/2011 Slide 36 of 39

38 Lecture #11-8/4/2011 Slide 37 of 39

39 In general, the results from a canonical correlations routine are related to: 1. Regression 2. Discriminant Analysis (we will learn this next week) 3. MANOVA However, the goals of canonical correlation overlap with the information provided by a confirmatory factor analysis or structural equation model... Lecture #11-8/4/2011 Slide 38 of 39

40 Final Thought The midterm was accomplished using MANOVA and MANCOVA. Canonical correlation analysis is a complicated analysis that provides many results of interest to researchers. Perhaps because of it s complicated nature, canonical correlation analysis is not often used. Last week: Nebraska...This week: Texas...After that: The world. Tomorrow: Lab Day! Meet in Helen Newberry s Michigan Lab Lecture #11-8/4/2011 Slide 39 of 39

Canonical Correlation

Canonical Correlation Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Multivariate Analysis (Slides 4)

Multivariate Analysis (Slides 4) Multivariate Analysis (Slides 4) In today s class we examine examples of principal components analysis. We shall consider a difficulty in applying the analysis and consider a method for resolving this.

More information

A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

More information

6 Variables: PD MF MA K IAH SBS

6 Variables: PD MF MA K IAH SBS options pageno=min nodate formdlim='-'; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354-366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Exploratory Factor Analysis

Exploratory Factor Analysis Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.

More information

Multiple Regression YX1 YX2 X1X2 YX1.X2

Multiple Regression YX1 YX2 X1X2 YX1.X2 Multiple Regression Simple or total correlation: relationship between one dependent and one independent variable, Y versus X Coefficient of simple determination: r (or r, r ) YX YX XX Partial correlation:

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses

Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses Applied Multilevel Models for Cross Sectional Data Lecture 11 ICPSR Summer Workshop University

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

More information

Introduction to Confirmatory Factor Analysis and Structural Equation Modeling

Introduction to Confirmatory Factor Analysis and Structural Equation Modeling Introduction to Confirmatory Factor Analysis and Structural Equation Modeling Lecture 12 August 7, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Class An Introduction to:

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued). MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

More information

1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

More information

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics. Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Principal Component Analysis Application to images

Principal Component Analysis Application to images Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

More information

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/ Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

A Demonstration of Hierarchical Clustering

A Demonstration of Hierarchical Clustering Recitation Supplement: Hierarchical Clustering and Principal Component Analysis in SAS November 18, 2002 The Methods In addition to K-means clustering, SAS provides several other types of unsupervised

More information

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 SEVENTH EDITION MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 A Global Perspective Joseph F. Hair, Jr. Kennesaw State University William C. Black Louisiana State University Barry J. Babin University of Southern

More information

Introduction to Multivariate Analysis

Introduction to Multivariate Analysis Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief

More information

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables. FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.

More information

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

More information

OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable. OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

More information

3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

Common factor analysis

Common factor analysis Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated. Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

More information

Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline. Correlation & Regression, III. Review. Relationship between r and regression Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

Chapter 11: Two Variable Regression Analysis

Chapter 11: Two Variable Regression Analysis Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

More information

Econ 371 Problem Set #4 Answer Sheet. P rice = (0.485)BDR + (23.4)Bath + (0.156)Hsize + (0.002)LSize + (0.090)Age (48.

Econ 371 Problem Set #4 Answer Sheet. P rice = (0.485)BDR + (23.4)Bath + (0.156)Hsize + (0.002)LSize + (0.090)Age (48. Econ 371 Problem Set #4 Answer Sheet 6.5 This question focuses on what s called a hedonic regression model; i.e., where the sales price of the home is regressed on the various attributes of the home. The

More information

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1)

9 Hedging the Risk of an Energy Futures Portfolio UNCORRECTED PROOFS. Carol Alexander 9.1 MAPPING PORTFOLIOS TO CONSTANT MATURITY FUTURES 12 T 1) Helyette Geman c0.tex V - 0//0 :00 P.M. Page Hedging the Risk of an Energy Futures Portfolio Carol Alexander This chapter considers a hedging problem for a trader in futures on crude oil, heating oil and

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

More information

Curvilinear Regression Analysis

Curvilinear Regression Analysis Analysis Lecture 18 April 7, 2005 Applied Analysis Lecture #18-4/7/2005 Slide 1 of 29 Today s Lecture ANOVA with a continuous independent variable. Today s Lecture regression analysis. Interactions with

More information

SPSS: Descriptive and Inferential Statistics. For Windows

SPSS: Descriptive and Inferential Statistics. For Windows For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

More information

Regression and Correlation

Regression and Correlation Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

More information

SOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures.

SOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures. 1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation for some selected procedures. The information provided here is not exhaustive. There is more to

More information

Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into

Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into different groups). In this case we re looking at a

More information

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business

Factor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables

More information

Hatice Camgöz Akdağ. findings of previous research in which two independent firm clusters were

Hatice Camgöz Akdağ. findings of previous research in which two independent firm clusters were Innovative Culture and Total Quality Management as a Tool for Sustainable Competitiveness: A Case Study of Turkish Fruit and Vegetable Processing Industry SMEs, Sedef Akgüngör Hatice Camgöz Akdağ Aslı

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

Chapter 12 : Linear Correlation and Linear Regression

Chapter 12 : Linear Correlation and Linear Regression Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

Functional Data Analysis with R and MATLAB

Functional Data Analysis with R and MATLAB J.O. Ramsay Giles Hooker Spencer Graves Functional Data Analysis with R and MATLAB Springer Contents 2 Introduction to Functional Data Analysis. What Are Functional Data?.. Data on the Growth of Girls..2

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

A correlation exists between two variables when one of them is related to the other in some way.

A correlation exists between two variables when one of them is related to the other in some way. Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

Understanding and Using Factor Scores: Considerations for the Applied Researcher

Understanding and Using Factor Scores: Considerations for the Applied Researcher A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Simple Regression and Correlation

Simple Regression and Correlation Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

More information

1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F.

1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F. Final Exam 1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F. 3. Identify the graphic form and nature of the source

More information

Estimation with Minimum Mean Square Error

Estimation with Minimum Mean Square Error C H A P T E R 8 Estimation with Minimum Mean Square Error INTRODUCTION A recurring theme in this text and in much of communication, control and signal processing is that of making systematic estimates,

More information

4. Matrix Methods for Analysis of Structure in Data Sets:

4. Matrix Methods for Analysis of Structure in Data Sets: ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

More information

Using least squares Monte Carlo for capital calculation 21 November 2011

Using least squares Monte Carlo for capital calculation 21 November 2011 Life Conference and Exhibition 2011 Adam Koursaris, Peter Murphy Using least squares Monte Carlo for capital calculation 21 November 2011 Agenda SCR calculation Nested stochastic problem Limitations of

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Step 1: Set the equation equal to zero if the function lacks. Step 2: Subtract the constant term from both sides:

Step 1: Set the equation equal to zero if the function lacks. Step 2: Subtract the constant term from both sides: In most situations the quadratic equations such as: x 2 + 8x + 5, can be solved (factored) through the quadratic formula if factoring it out seems too hard. However, some of these problems may be solved

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING MULTIPLE REGRESSION UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

More information

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

How to get more value from your survey data

How to get more value from your survey data IBM SPSS Statistics How to get more value from your survey data Discover four advanced analysis techniques that make survey research more effective Contents: 1 Introduction 2 Descriptive survey research

More information

Math 265 (Butler) Practice Midterm II B (Solutions)

Math 265 (Butler) Practice Midterm II B (Solutions) Math 265 (Butler) Practice Midterm II B (Solutions) 1. Find (x 0, y 0 ) so that the plane tangent to the surface z f(x, y) x 2 + 3xy y 2 at ( x 0, y 0, f(x 0, y 0 ) ) is parallel to the plane 16x 2y 2z

More information