# Introduction: Overview of Kernel Methods

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced Studies October 6-10, 2008, Kyushu University

2 Outline Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 2 / 25

3 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 3 / 25

4 Nonlinear Data Analysis I Classical linear methods Data is expressed by a matrix. X1 1 X1 2 X m 1 X2 1 X2 2 X2 m X =. XN 1 XN 2 XN m (m dimensional, N data) Linear operations (matrix operations) are used for data analysis. e.g. - Principal component analysis (PCA) - Canonical correlation analysis (CCA) - Linear regression analysis - Fisher discriminant analysis (FDA) - Logistic regression, etc. 4 / 25

5 Nonlinear Data Analysis II Are linear methods sufficient? Nonlinear transform can help. Example 1: classification linearly inseparable linearly separable x z z z 2 x 1 (x 1, x 2 ) (z 1, z 2, z 3 ) = (x 2 1, x 2 2, 2x 1 x 2 ) (Unclear? watch 5 / 25

6 Example 2: dependence of two data Correlation ρ XY = Cov[X, Y ] E[(X E[X])(Y E[Y ])] = Var[X]Var[Y ] E[(X E[X])2 ]E[(Y E[Y ]) 2 ]. Transforming data to incorporate high-order moments seems attractive. 6 / 25

7 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 7 / 25

8 Feature space for transforming data Kernel methodology = a systematic way of analyzing data by transforming them into a high-dimensional feature space. Apply linear methods on the feature space. Which type of space serves as a feature space? The space should incorporate various nonlinear information of the original data. The inner product of the feature space is essential for data analysis (seen in the next subsection). 8 / 25

9 Computational problem of inner product For example, how about this? (X, Y, Z) (X, Y, Z, X 2, Y 2, Z 2, XY, Y Z, ZX,...). But, for high-dimensional data, the above expansion makes the feature space very huge! e.g. If X is 100 dimensional and the moments up to the third order are used, the dimensionality of feature space is 100C C C 3 = This causes a serious computational problem in working on the inner product of the feature space. We need a cleverer way of computing it. Kernel method. 9 / 25

10 Inner product by positive definite kernel A positive definite kernel gives efficient computation of the inner product: With special choice of the feature space, we have a function k(x, y) such that where Φ(X i ), Φ(X j ) = k(x i, X j ), positive definite kernel X x Φ(x) H (feature space). Many linear methods use only the inner product without necessity of the explicit form of the vector Φ(X). 10 / 25

11 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 11 / 25

12 X 1,..., X N : m-dimensional data. Review of PCA I Principal Component Analysis (PCA) Find d-directions to maximize the variance. Purpose: represent the structure of the data in a low dimensional space. 12 / 25

13 The first principal direction: Review of PCA II { 1 N u 1 = arg max u =1 N i=1 ut (X i 1 N N j=1 X j) } 2 = arg max u =1 ut V u, where V is the variance-covariance matrix: V = 1 N N i=1 (X i 1 N N j=1 X j)(x i 1 N N j=1 X j) T. - Eigenvectors u 1,..., u m of V (in descending order). - The p-th principal axis = u p. - The p-th principal component of X i = u T p X i Observation: PCA can be done if we can compute the inner product covariance matrix V, inner product between the unit eigenvector and the data. 13 / 25

14 Kernel PCA I X 1,..., X N : m-dimensional data. Transform the data by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply PCA to the transformed data: Maximize the variance of the projection onto the unit vector f. 1 N max Var[ f, Φ(X) ] = max f =1 f =1 N i=1( f, Φ(Xi ) 1 N N j=1 Φ(X j) ) 2 Note: it suffices to use f = n i=1 a i Φ(X i ), where Φ(X i ) = Φ(X i ) 1 N N j=1 Φ(X j). The direction orthogonal to Span{ Φ(X 1 ),..., Φ(X N )} does not contribute. 14 / 25

15 Kernel PCA II The PCA solution: max a T K2 a subject to a T Ka = 1, where K is N N matrix with K ij = Φ(X i ), Φ(X j ). Note: 1 N N i=1 f, Φ(X i ) 2 = 1 N N i=1 N j=1 a Φ(X j j ), Φ(X i ) 2 = 1 N at K2 a, f 2 = n i=1 a i Φ(X i ), n i=1 a i Φ(X i ) = a T Ka. The first principal component of the data X i is Φ(X i ), ˆf = N i=1 λ1 u 1 i, where K = N i=1 λ iu i u it is the eigen decomposition. 15 / 25

16 Observation: Kernel PCA III PCA in the feature space can be done if we can compute Φ(X i ), Φ(X j ) or Φ(X i ), Φ(X j ) = k(x i, X j ). The principal direction is obtained in the form f = i a i Φ(X i ), i.e., in the linear hull of the data. Note: K ij = Φ(X i), Φ(X j) = Φ(X i), Φ(X j) 1 N N b=1 Φ(Xi), Φ(X b) 1 N N a=1 1 Φ(Xa), Φ(Xj) + N N 2 a=1 Φ(Xa), Φ(X b) = k(x i, X j) 1 N N b=1 k(xi, X b) 1 N N 1 a=1k(xa, Xj) + N N 2 a=1 k(xa, X b) 16 / 25

17 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 17 / 25

18 Linear regression Review: Linear Regression I Data: (X 1, Y 1 ),..., (X N, Y N ): data X i: explanatory variable, covariate (m-dimensional) Y i: response variable, (1 dimensional) Regression model: find the best linear relation Y i = a T X i + ε i 18 / 25

19 Review: Linear Regression II Least square method: min a N i=1 Y i a T X i 2. Matrix expression X1 1 X1 2 X1 m Y 1 X2 1 X2 2 X2 m X =., Y = Y 2.. XN 1 X2 N Xm N Y N Solution: â = (X T X) 1 X T Y ŷ = â T x = Y T X(X T X) 1 x. Observation: Linear regressio can be done if we can compute the inner product X T X, â T x and so on. 19 / 25

20 Ridge Regression Ridge regression: Find a linear relation by min a N i=1 Y i a T X i 2 + λ a 2. Solution For a general x, â = (X T X + λi N ) 1 X T Y λ: regularization coefficient. ŷ(x) = â T x = Y T X(X T X + λi N ) 1 x. Ridge regression is useful when (X T X) 1 does not exist, or inversion is numerically unstable. 20 / 25

21 Kernelization of Ridge Regression I (X 1, Y 1 )..., (X N, Y N ) (Y i : 1-dimensional) Transform X i by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply ridge regression to the transformed data: Find the vector f such that N min i=1 Y i f, Φ(X i ) H 2 + λ f 2 H. f H Similarly to kernel PCA, we can assume f = n j=1 c jφ(x j ). N min c i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H 21 / 25

22 Kernelization of Ridge Regression II Solution: ĉ = (K+λI N ) 1 Y, where K ij = Φ(X i ), Φ(X j ) H = k(x i, X j ). For a general x, ŷ(x) = f, Φ(x) H = jĉjφ(x j ), Φ(x) H = Y T (K + λi N ) 1 k, where k = Φ(X 1 ), Φ(x). Φ(X N ), Φ(x) = k(x 1, x).. k(x N, x) 22 / 25

23 Kernelization of Ridge Regression III Proof. Matrix expression gives N i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H = (Y Kc) T (Y Kc) + λc T Kc = c T (K 2 + λk)c 2Y T Kc + Y T Y. It follows that the optimal c is given by ĉ = (K + λi N ) 1 Y. Inserting this to ŷ(x) = j ĉjφ(x j ), Φ(x) H, we have the claim. 23 / 25

24 Kernelization of Ridge Regression IV Observation: Ridge regression in the feature space can be done if we can compute the inner product Φ(X i ), Φ(X j ) = k(x i, X j ). The resulting coefficient is of the form f = i c iφ(x i ), i.e., in the linear hull of the data. The orthogonal directions do not contribute to the objective function. 24 / 25

25 Kernel methodology A feature space H with inner product,. Mapping of the data into a feature space: X 1,..., X N Φ(X 1 ),..., Φ(X N ) H. If the computation of the inner product Φ(X i ), Φ(X i ) is tractable, various linear methods can be extended to the feature space. Give Methods of nonlinear data analysis. How can we prepare such a feature space? Positive definite kernel! 25 / 25

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Introduction to Machine Learning

Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Correlation in Random Variables

Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

### Regression With Gaussian Measures

Regression With Gaussian Measures Michael J. Meyer Copyright c April 11, 2004 ii PREFACE We treat the basics of Gaussian processes, Gaussian measures, kernel reproducing Hilbert spaces and related topics.

### ECE302 Spring 2006 HW7 Solutions March 11, 2006 1

ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example

PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

### Covariance and Correlation. Consider the joint probability distribution f XY (x, y).

Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 2: Section 5-2 Covariance and Correlation Consider the joint probability distribution f XY (x, y). Is there a relationship between X and Y? If so, what kind?

### Lecture 2. Observables

Lecture 2 Observables 13 14 LECTURE 2. OBSERVABLES 2.1 Observing observables We have seen at the end of the previous lecture that each dynamical variable is associated to a linear operator Ô, and its expectation

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Canonical Correlation Analysis

Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #11-8/4/2011 Slide 1 of 39 Today s Lecture Canonical Correlation Analysis

### Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### Object Recognition and Template Matching

Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

### Some probability and statistics

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

### Applied Multivariate Analysis

Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

### Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### High-Dimensional Data Visualization by PCA and LDA

High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Filtered Gaussian Processes for Learning with Large Data-Sets

Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University

### Factor Analysis. Factor Analysis

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### 4. Matrix Methods for Analysis of Structure in Data Sets:

ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

### October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

### Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

### Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

### MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

### Chapter 4. Multivariate Distributions

1 Chapter 4. Multivariate Distributions Joint p.m.f. (p.d.f.) Independent Random Variables Covariance and Correlation Coefficient Expectation and Covariance Matrix Multivariate (Normal) Distributions Matlab

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### 4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

### Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

### The Bivariate Normal Distribution

The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

### 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each)

Math 33 AH : Solution to the Final Exam Honors Linear Algebra and Applications 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) (1) If A is an invertible

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### A repeated measures concordance correlation coefficient

A repeated measures concordance correlation coefficient Presented by Yan Ma July 20,2007 1 The CCC measures agreement between two methods or time points by measuring the variation of their linear relationship

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141

Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard

### KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2

### Introduction to Principal Component Analysis: Stock Market Values

Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

### Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Multidimensional data and factorial methods

Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

### Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

### University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

### Calculate the holding period return for this investment. It is approximately

1. An investor purchases 100 shares of XYZ at the beginning of the year for \$35. The stock pays a cash dividend of \$3 per share. The price of the stock at the time of the dividend is \$30. The dividend

### Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

### Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

### Optimization for Machine Learning

Optimization for Machine Learning Lecture 4: SMO-MKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning

### Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### 1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

Factorisation 1.5 Introduction In Block 4 we showed the way in which brackets were removed from algebraic expressions. Factorisation, which can be considered as the reverse of this process, is dealt with

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### 1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### The Scalar Algebra of Means, Covariances, and Correlations

3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

### Dimensionality Reduction A Short Tutorial

Dimensionality Reduction A Short Tutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction

### Data visualization and dimensionality reduction using kernel maps with a reference point

Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

### Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

### Clustering in the Linear Model

Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

### Forecasting Financial Time Series

Canberra, February, 2007 Contents Introduction 1 Introduction : Problems and Approaches 2 3 : Problems and Approaches : Problems and Approaches Time series: (Relative) returns r t = p t p t 1 p t 1, t

### Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody Oregon Graduate nstitute of Science and Technology NW, Walker Rd., Beaverton, OR976, USA hyang@ece.ogi.edu,

### An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)

DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Principal Component Analysis

Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

### Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

### Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

### Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### Linear Models for Classification

Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci