Introduction: Overview of Kernel Methods

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Introduction: Overview of Kernel Methods"

Transcription

1 Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced Studies October 6-10, 2008, Kyushu University

2 Outline Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 2 / 25

3 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 3 / 25

4 Nonlinear Data Analysis I Classical linear methods Data is expressed by a matrix. X1 1 X1 2 X m 1 X2 1 X2 2 X2 m X =. XN 1 XN 2 XN m (m dimensional, N data) Linear operations (matrix operations) are used for data analysis. e.g. - Principal component analysis (PCA) - Canonical correlation analysis (CCA) - Linear regression analysis - Fisher discriminant analysis (FDA) - Logistic regression, etc. 4 / 25

5 Nonlinear Data Analysis II Are linear methods sufficient? Nonlinear transform can help. Example 1: classification linearly inseparable linearly separable x z z z 2 x 1 (x 1, x 2 ) (z 1, z 2, z 3 ) = (x 2 1, x 2 2, 2x 1 x 2 ) (Unclear? watch 5 / 25

6 Example 2: dependence of two data Correlation ρ XY = Cov[X, Y ] E[(X E[X])(Y E[Y ])] = Var[X]Var[Y ] E[(X E[X])2 ]E[(Y E[Y ]) 2 ]. Transforming data to incorporate high-order moments seems attractive. 6 / 25

7 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 7 / 25

8 Feature space for transforming data Kernel methodology = a systematic way of analyzing data by transforming them into a high-dimensional feature space. Apply linear methods on the feature space. Which type of space serves as a feature space? The space should incorporate various nonlinear information of the original data. The inner product of the feature space is essential for data analysis (seen in the next subsection). 8 / 25

9 Computational problem of inner product For example, how about this? (X, Y, Z) (X, Y, Z, X 2, Y 2, Z 2, XY, Y Z, ZX,...). But, for high-dimensional data, the above expansion makes the feature space very huge! e.g. If X is 100 dimensional and the moments up to the third order are used, the dimensionality of feature space is 100C C C 3 = This causes a serious computational problem in working on the inner product of the feature space. We need a cleverer way of computing it. Kernel method. 9 / 25

10 Inner product by positive definite kernel A positive definite kernel gives efficient computation of the inner product: With special choice of the feature space, we have a function k(x, y) such that where Φ(X i ), Φ(X j ) = k(x i, X j ), positive definite kernel X x Φ(x) H (feature space). Many linear methods use only the inner product without necessity of the explicit form of the vector Φ(X). 10 / 25

11 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 11 / 25

12 X 1,..., X N : m-dimensional data. Review of PCA I Principal Component Analysis (PCA) Find d-directions to maximize the variance. Purpose: represent the structure of the data in a low dimensional space. 12 / 25

13 The first principal direction: Review of PCA II { 1 N u 1 = arg max u =1 N i=1 ut (X i 1 N N j=1 X j) } 2 = arg max u =1 ut V u, where V is the variance-covariance matrix: V = 1 N N i=1 (X i 1 N N j=1 X j)(x i 1 N N j=1 X j) T. - Eigenvectors u 1,..., u m of V (in descending order). - The p-th principal axis = u p. - The p-th principal component of X i = u T p X i Observation: PCA can be done if we can compute the inner product covariance matrix V, inner product between the unit eigenvector and the data. 13 / 25

14 Kernel PCA I X 1,..., X N : m-dimensional data. Transform the data by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply PCA to the transformed data: Maximize the variance of the projection onto the unit vector f. 1 N max Var[ f, Φ(X) ] = max f =1 f =1 N i=1( f, Φ(Xi ) 1 N N j=1 Φ(X j) ) 2 Note: it suffices to use f = n i=1 a i Φ(X i ), where Φ(X i ) = Φ(X i ) 1 N N j=1 Φ(X j). The direction orthogonal to Span{ Φ(X 1 ),..., Φ(X N )} does not contribute. 14 / 25

15 Kernel PCA II The PCA solution: max a T K2 a subject to a T Ka = 1, where K is N N matrix with K ij = Φ(X i ), Φ(X j ). Note: 1 N N i=1 f, Φ(X i ) 2 = 1 N N i=1 N j=1 a Φ(X j j ), Φ(X i ) 2 = 1 N at K2 a, f 2 = n i=1 a i Φ(X i ), n i=1 a i Φ(X i ) = a T Ka. The first principal component of the data X i is Φ(X i ), ˆf = N i=1 λ1 u 1 i, where K = N i=1 λ iu i u it is the eigen decomposition. 15 / 25

16 Observation: Kernel PCA III PCA in the feature space can be done if we can compute Φ(X i ), Φ(X j ) or Φ(X i ), Φ(X j ) = k(x i, X j ). The principal direction is obtained in the form f = i a i Φ(X i ), i.e., in the linear hull of the data. Note: K ij = Φ(X i), Φ(X j) = Φ(X i), Φ(X j) 1 N N b=1 Φ(Xi), Φ(X b) 1 N N a=1 1 Φ(Xa), Φ(Xj) + N N 2 a=1 Φ(Xa), Φ(X b) = k(x i, X j) 1 N N b=1 k(xi, X b) 1 N N 1 a=1k(xa, Xj) + N N 2 a=1 k(xa, X b) 16 / 25

17 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 17 / 25

18 Linear regression Review: Linear Regression I Data: (X 1, Y 1 ),..., (X N, Y N ): data X i: explanatory variable, covariate (m-dimensional) Y i: response variable, (1 dimensional) Regression model: find the best linear relation Y i = a T X i + ε i 18 / 25

19 Review: Linear Regression II Least square method: min a N i=1 Y i a T X i 2. Matrix expression X1 1 X1 2 X1 m Y 1 X2 1 X2 2 X2 m X =., Y = Y 2.. XN 1 X2 N Xm N Y N Solution: â = (X T X) 1 X T Y ŷ = â T x = Y T X(X T X) 1 x. Observation: Linear regressio can be done if we can compute the inner product X T X, â T x and so on. 19 / 25

20 Ridge Regression Ridge regression: Find a linear relation by min a N i=1 Y i a T X i 2 + λ a 2. Solution For a general x, â = (X T X + λi N ) 1 X T Y λ: regularization coefficient. ŷ(x) = â T x = Y T X(X T X + λi N ) 1 x. Ridge regression is useful when (X T X) 1 does not exist, or inversion is numerically unstable. 20 / 25

21 Kernelization of Ridge Regression I (X 1, Y 1 )..., (X N, Y N ) (Y i : 1-dimensional) Transform X i by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply ridge regression to the transformed data: Find the vector f such that N min i=1 Y i f, Φ(X i ) H 2 + λ f 2 H. f H Similarly to kernel PCA, we can assume f = n j=1 c jφ(x j ). N min c i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H 21 / 25

22 Kernelization of Ridge Regression II Solution: ĉ = (K+λI N ) 1 Y, where K ij = Φ(X i ), Φ(X j ) H = k(x i, X j ). For a general x, ŷ(x) = f, Φ(x) H = jĉjφ(x j ), Φ(x) H = Y T (K + λi N ) 1 k, where k = Φ(X 1 ), Φ(x). Φ(X N ), Φ(x) = k(x 1, x).. k(x N, x) 22 / 25

23 Kernelization of Ridge Regression III Proof. Matrix expression gives N i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H = (Y Kc) T (Y Kc) + λc T Kc = c T (K 2 + λk)c 2Y T Kc + Y T Y. It follows that the optimal c is given by ĉ = (K + λi N ) 1 Y. Inserting this to ŷ(x) = j ĉjφ(x j ), Φ(x) H, we have the claim. 23 / 25

24 Kernelization of Ridge Regression IV Observation: Ridge regression in the feature space can be done if we can compute the inner product Φ(X i ), Φ(X j ) = k(x i, X j ). The resulting coefficient is of the form f = i c iφ(x i ), i.e., in the linear hull of the data. The orthogonal directions do not contribute to the objective function. 24 / 25

25 Kernel methodology A feature space H with inner product,. Mapping of the data into a feature space: X 1,..., X N Φ(X 1 ),..., Φ(X N ) H. If the computation of the inner product Φ(X i ), Φ(X i ) is tractable, various linear methods can be extended to the feature space. Give Methods of nonlinear data analysis. How can we prepare such a feature space? Positive definite kernel! 25 / 25

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Correlation in Random Variables

Correlation in Random Variables Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Regression With Gaussian Measures

Regression With Gaussian Measures Regression With Gaussian Measures Michael J. Meyer Copyright c April 11, 2004 ii PREFACE We treat the basics of Gaussian processes, Gaussian measures, kernel reproducing Hilbert spaces and related topics.

More information

ECE302 Spring 2006 HW7 Solutions March 11, 2006 1

ECE302 Spring 2006 HW7 Solutions March 11, 2006 1 ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example

PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

Covariance and Correlation. Consider the joint probability distribution f XY (x, y).

Covariance and Correlation. Consider the joint probability distribution f XY (x, y). Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 2: Section 5-2 Covariance and Correlation Consider the joint probability distribution f XY (x, y). Is there a relationship between X and Y? If so, what kind?

More information

Lecture 2. Observables

Lecture 2. Observables Lecture 2 Observables 13 14 LECTURE 2. OBSERVABLES 2.1 Observing observables We have seen at the end of the previous lecture that each dynamical variable is associated to a linear operator Ô, and its expectation

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Cheng Soon Ong & Christfried Webers. Canberra February June 2016

Cheng Soon Ong & Christfried Webers. Canberra February June 2016 c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #11-8/4/2011 Slide 1 of 39 Today s Lecture Canonical Correlation Analysis

More information

Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

Some probability and statistics

Some probability and statistics Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

High-Dimensional Data Visualization by PCA and LDA

High-Dimensional Data Visualization by PCA and LDA High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,

More information

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Filtered Gaussian Processes for Learning with Large Data-Sets

Filtered Gaussian Processes for Learning with Large Data-Sets Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

4. Matrix Methods for Analysis of Structure in Data Sets:

4. Matrix Methods for Analysis of Structure in Data Sets: ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 425, PRACTICE FINAL EXAM SOLUTIONS. MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

More information

Chapter 4. Multivariate Distributions

Chapter 4. Multivariate Distributions 1 Chapter 4. Multivariate Distributions Joint p.m.f. (p.d.f.) Independent Random Variables Covariance and Correlation Coefficient Expectation and Covariance Matrix Multivariate (Normal) Distributions Matlab

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

The Bivariate Normal Distribution

The Bivariate Normal Distribution The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

More information

1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each)

1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) Math 33 AH : Solution to the Final Exam Honors Linear Algebra and Applications 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) (1) If A is an invertible

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

A repeated measures concordance correlation coefficient

A repeated measures concordance correlation coefficient A repeated measures concordance correlation coefficient Presented by Yan Ma July 20,2007 1 The CCC measures agreement between two methods or time points by measuring the variation of their linear relationship

More information

Canonical Correlation

Canonical Correlation Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

More information

Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141

Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304.  jones/courses/141 Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard

More information

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Multidimensional data and factorial methods

Multidimensional data and factorial methods Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

More information

Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Calculate the holding period return for this investment. It is approximately

Calculate the holding period return for this investment. It is approximately 1. An investor purchases 100 shares of XYZ at the beginning of the year for $35. The stock pays a cash dividend of $3 per share. The price of the stock at the time of the dividend is $30. The dividend

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Lecture 4: SMO-MKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style Factorisation 1.5 Introduction In Block 4 we showed the way in which brackets were removed from algebraic expressions. Factorisation, which can be considered as the reverse of this process, is dealt with

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

More information

1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i ) Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

The Scalar Algebra of Means, Covariances, and Correlations

The Scalar Algebra of Means, Covariances, and Correlations 3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

More information

Dimensionality Reduction A Short Tutorial

Dimensionality Reduction A Short Tutorial Dimensionality Reduction A Short Tutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction

More information

Data visualization and dimensionality reduction using kernel maps with a reference point

Data visualization and dimensionality reduction using kernel maps with a reference point Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

Forecasting Financial Time Series

Forecasting Financial Time Series Canberra, February, 2007 Contents Introduction 1 Introduction : Problems and Approaches 2 3 : Problems and Approaches : Problems and Approaches Time series: (Relative) returns r t = p t p t 1 p t 1, t

More information

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody Oregon Graduate nstitute of Science and Technology NW, Walker Rd., Beaverton, OR976, USA hyang@ece.ogi.edu,

More information

An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)

An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component) Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

More information