Introduction: Overview of Kernel Methods


 Derek Boone
 3 years ago
 Views:
Transcription
1 Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced Studies October 610, 2008, Kyushu University
2 Outline Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 2 / 25
3 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 3 / 25
4 Nonlinear Data Analysis I Classical linear methods Data is expressed by a matrix. X1 1 X1 2 X m 1 X2 1 X2 2 X2 m X =. XN 1 XN 2 XN m (m dimensional, N data) Linear operations (matrix operations) are used for data analysis. e.g.  Principal component analysis (PCA)  Canonical correlation analysis (CCA)  Linear regression analysis  Fisher discriminant analysis (FDA)  Logistic regression, etc. 4 / 25
5 Nonlinear Data Analysis II Are linear methods sufficient? Nonlinear transform can help. Example 1: classification linearly inseparable linearly separable x z z z 2 x 1 (x 1, x 2 ) (z 1, z 2, z 3 ) = (x 2 1, x 2 2, 2x 1 x 2 ) (Unclear? watch 5 / 25
6 Example 2: dependence of two data Correlation ρ XY = Cov[X, Y ] E[(X E[X])(Y E[Y ])] = Var[X]Var[Y ] E[(X E[X])2 ]E[(Y E[Y ]) 2 ]. Transforming data to incorporate highorder moments seems attractive. 6 / 25
7 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 7 / 25
8 Feature space for transforming data Kernel methodology = a systematic way of analyzing data by transforming them into a highdimensional feature space. Apply linear methods on the feature space. Which type of space serves as a feature space? The space should incorporate various nonlinear information of the original data. The inner product of the feature space is essential for data analysis (seen in the next subsection). 8 / 25
9 Computational problem of inner product For example, how about this? (X, Y, Z) (X, Y, Z, X 2, Y 2, Z 2, XY, Y Z, ZX,...). But, for highdimensional data, the above expansion makes the feature space very huge! e.g. If X is 100 dimensional and the moments up to the third order are used, the dimensionality of feature space is 100C C C 3 = This causes a serious computational problem in working on the inner product of the feature space. We need a cleverer way of computing it. Kernel method. 9 / 25
10 Inner product by positive definite kernel A positive definite kernel gives efficient computation of the inner product: With special choice of the feature space, we have a function k(x, y) such that where Φ(X i ), Φ(X j ) = k(x i, X j ), positive definite kernel X x Φ(x) H (feature space). Many linear methods use only the inner product without necessity of the explicit form of the vector Φ(X). 10 / 25
11 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 11 / 25
12 X 1,..., X N : mdimensional data. Review of PCA I Principal Component Analysis (PCA) Find ddirections to maximize the variance. Purpose: represent the structure of the data in a low dimensional space. 12 / 25
13 The first principal direction: Review of PCA II { 1 N u 1 = arg max u =1 N i=1 ut (X i 1 N N j=1 X j) } 2 = arg max u =1 ut V u, where V is the variancecovariance matrix: V = 1 N N i=1 (X i 1 N N j=1 X j)(x i 1 N N j=1 X j) T.  Eigenvectors u 1,..., u m of V (in descending order).  The pth principal axis = u p.  The pth principal component of X i = u T p X i Observation: PCA can be done if we can compute the inner product covariance matrix V, inner product between the unit eigenvector and the data. 13 / 25
14 Kernel PCA I X 1,..., X N : mdimensional data. Transform the data by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply PCA to the transformed data: Maximize the variance of the projection onto the unit vector f. 1 N max Var[ f, Φ(X) ] = max f =1 f =1 N i=1( f, Φ(Xi ) 1 N N j=1 Φ(X j) ) 2 Note: it suffices to use f = n i=1 a i Φ(X i ), where Φ(X i ) = Φ(X i ) 1 N N j=1 Φ(X j). The direction orthogonal to Span{ Φ(X 1 ),..., Φ(X N )} does not contribute. 14 / 25
15 Kernel PCA II The PCA solution: max a T K2 a subject to a T Ka = 1, where K is N N matrix with K ij = Φ(X i ), Φ(X j ). Note: 1 N N i=1 f, Φ(X i ) 2 = 1 N N i=1 N j=1 a Φ(X j j ), Φ(X i ) 2 = 1 N at K2 a, f 2 = n i=1 a i Φ(X i ), n i=1 a i Φ(X i ) = a T Ka. The first principal component of the data X i is Φ(X i ), ˆf = N i=1 λ1 u 1 i, where K = N i=1 λ iu i u it is the eigen decomposition. 15 / 25
16 Observation: Kernel PCA III PCA in the feature space can be done if we can compute Φ(X i ), Φ(X j ) or Φ(X i ), Φ(X j ) = k(x i, X j ). The principal direction is obtained in the form f = i a i Φ(X i ), i.e., in the linear hull of the data. Note: K ij = Φ(X i), Φ(X j) = Φ(X i), Φ(X j) 1 N N b=1 Φ(Xi), Φ(X b) 1 N N a=1 1 Φ(Xa), Φ(Xj) + N N 2 a=1 Φ(Xa), Φ(X b) = k(x i, X j) 1 N N b=1 k(xi, X b) 1 N N 1 a=1k(xa, Xj) + N N 2 a=1 k(xa, X b) 16 / 25
17 Basic idea of kernel methods Linear and nonlinear Data Analysis Essence of kernel methodology Kernel PCA: Nonlinear extension of PCA Ridge regression and its kernelization 17 / 25
18 Linear regression Review: Linear Regression I Data: (X 1, Y 1 ),..., (X N, Y N ): data X i: explanatory variable, covariate (mdimensional) Y i: response variable, (1 dimensional) Regression model: find the best linear relation Y i = a T X i + ε i 18 / 25
19 Review: Linear Regression II Least square method: min a N i=1 Y i a T X i 2. Matrix expression X1 1 X1 2 X1 m Y 1 X2 1 X2 2 X2 m X =., Y = Y 2.. XN 1 X2 N Xm N Y N Solution: â = (X T X) 1 X T Y ŷ = â T x = Y T X(X T X) 1 x. Observation: Linear regressio can be done if we can compute the inner product X T X, â T x and so on. 19 / 25
20 Ridge Regression Ridge regression: Find a linear relation by min a N i=1 Y i a T X i 2 + λ a 2. Solution For a general x, â = (X T X + λi N ) 1 X T Y λ: regularization coefficient. ŷ(x) = â T x = Y T X(X T X + λi N ) 1 x. Ridge regression is useful when (X T X) 1 does not exist, or inversion is numerically unstable. 20 / 25
21 Kernelization of Ridge Regression I (X 1, Y 1 )..., (X N, Y N ) (Y i : 1dimensional) Transform X i by a feature map Φ into a feature space H: X 1,..., X N Φ(X 1 ),..., Φ(X N ) Assume that the feature space has the inner product,. Apply ridge regression to the transformed data: Find the vector f such that N min i=1 Y i f, Φ(X i ) H 2 + λ f 2 H. f H Similarly to kernel PCA, we can assume f = n j=1 c jφ(x j ). N min c i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H 21 / 25
22 Kernelization of Ridge Regression II Solution: ĉ = (K+λI N ) 1 Y, where K ij = Φ(X i ), Φ(X j ) H = k(x i, X j ). For a general x, ŷ(x) = f, Φ(x) H = jĉjφ(x j ), Φ(x) H = Y T (K + λi N ) 1 k, where k = Φ(X 1 ), Φ(x). Φ(X N ), Φ(x) = k(x 1, x).. k(x N, x) 22 / 25
23 Kernelization of Ridge Regression III Proof. Matrix expression gives N i=1 Y i N j=1 c jφ(x j ), Φ(X i ) H 2 + λ N j=1 c jφ(x j ) 2 H = (Y Kc) T (Y Kc) + λc T Kc = c T (K 2 + λk)c 2Y T Kc + Y T Y. It follows that the optimal c is given by ĉ = (K + λi N ) 1 Y. Inserting this to ŷ(x) = j ĉjφ(x j ), Φ(x) H, we have the claim. 23 / 25
24 Kernelization of Ridge Regression IV Observation: Ridge regression in the feature space can be done if we can compute the inner product Φ(X i ), Φ(X j ) = k(x i, X j ). The resulting coefficient is of the form f = i c iφ(x i ), i.e., in the linear hull of the data. The orthogonal directions do not contribute to the objective function. 24 / 25
25 Kernel methodology A feature space H with inner product,. Mapping of the data into a feature space: X 1,..., X N Φ(X 1 ),..., Φ(X N ) H. If the computation of the inner product Φ(X i ), Φ(X i ) is tractable, various linear methods can be extended to the feature space. Give Methods of nonlinear data analysis. How can we prepare such a feature space? Positive definite kernel! 25 / 25
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationCorrelation in Random Variables
Correlation in Random Variables Lecture 11 Spring 2002 Correlation in Random Variables Suppose that an experiment produces two random variables, X and Y. What can we say about the relationship between
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 23, Probability and Statistics, March 2015. Due:March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 3, Probability and Statistics, March 05. Due:March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationRegression With Gaussian Measures
Regression With Gaussian Measures Michael J. Meyer Copyright c April 11, 2004 ii PREFACE We treat the basics of Gaussian processes, Gaussian measures, kernel reproducing Hilbert spaces and related topics.
More informationECE302 Spring 2006 HW7 Solutions March 11, 2006 1
ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationPCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example
PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationExploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
More informationCovariance and Correlation. Consider the joint probability distribution f XY (x, y).
Chapter 5: JOINT PROBABILITY DISTRIBUTIONS Part 2: Section 52 Covariance and Correlation Consider the joint probability distribution f XY (x, y). Is there a relationship between X and Y? If so, what kind?
More informationLecture 2. Observables
Lecture 2 Observables 13 14 LECTURE 2. OBSERVABLES 2.1 Observing observables We have seen at the end of the previous lecture that each dynamical variable is associated to a linear operator Ô, and its expectation
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCanonical Correlation Analysis
Canonical Correlation Analysis Lecture 11 August 4, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #118/4/2011 Slide 1 of 39 Today s Lecture Canonical Correlation Analysis
More informationRandom Vectors and the Variance Covariance Matrix
Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationObject Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (subimage) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationSome probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
More informationApplied Multivariate Analysis
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Nonnormal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationHighDimensional Data Visualization by PCA and LDA
HighDimensional Data Visualization by PCA and LDA ChaurChin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationFiltered Gaussian Processes for Learning with Large DataSets
Filtered Gaussian Processes for Learning with Large DataSets Jian Qing Shi, Roderick MurraySmith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationproblem arises when only a nonrandom sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a nonrandom
More information4. Matrix Methods for Analysis of Structure in Data Sets:
ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More informationMATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
More informationChapter 4. Multivariate Distributions
1 Chapter 4. Multivariate Distributions Joint p.m.f. (p.d.f.) Independent Random Variables Covariance and Correlation Coefficient Expectation and Covariance Matrix Multivariate (Normal) Distributions Matlab
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the socalled high dimensional, low sample size (HDLSS) settings, LDA
More informationThe Bivariate Normal Distribution
The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included
More information1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each)
Math 33 AH : Solution to the Final Exam Honors Linear Algebra and Applications 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) (1) If A is an invertible
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationA repeated measures concordance correlation coefficient
A repeated measures concordance correlation coefficient Presented by Yan Ma July 20,2007 1 The CCC measures agreement between two methods or time points by measuring the variation of their linear relationship
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More informationMath 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141
Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard
More informationKERNEL LOGISTIC REGRESSIONLINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic RegressionLinear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSIONLINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More informationIntroduction to Principal Component Analysis: Stock Market Values
Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationMultidimensional data and factorial methods
Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation
More informationJoint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis
Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationCalculate the holding period return for this investment. It is approximately
1. An investor purchases 100 shares of XYZ at the beginning of the year for $35. The stock pays a cash dividend of $3 per share. The price of the stock at the time of the dividend is $30. The dividend
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationAverage Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
More informationOptimization for Machine Learning
Optimization for Machine Learning Lecture 4: SMOMKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning
More informationVisualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI0015 TKK, Finland jaakko.peltonen@tkk.fi
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More information1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style
Factorisation 1.5 Introduction In Block 4 we showed the way in which brackets were removed from algebraic expressions. Factorisation, which can be considered as the reverse of this process, is dealt with
More informationFitting Subjectspecific Curves to Grouped Longitudinal Data
Fitting Subjectspecific Curves to Grouped Longitudinal Data Djeundje, Viani HeriotWatt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK Email: vad5@hw.ac.uk Currie,
More informationELECE8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems
Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed
More information1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationThe Scalar Algebra of Means, Covariances, and Correlations
3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,
More informationDimensionality Reduction A Short Tutorial
Dimensionality Reduction A Short Tutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESATSCD/SISTA Kasteelpark Arenberg 1 B31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationTutorial on Exploratory Data Analysis
Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampusouest.fr francois.husson at agrocampusouest.fr Applied Mathematics Department, Agrocampus Ouest
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationForecasting Financial Time Series
Canberra, February, 2007 Contents Introduction 1 Introduction : Problems and Approaches 2 3 : Problems and Approaches : Problems and Approaches Time series: (Relative) returns r t = p t p t 1 p t 1, t
More informationData Visualization and Feature Selection: New Algorithms for Nongaussian Data
Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody Oregon Graduate nstitute of Science and Technology NW, Walker Rd., Beaverton, OR976, USA hyang@ece.ogi.edu,
More informationAn Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)
DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationPrincipal Component Analysis
Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More information