Data visualization and dimensionality reduction using kernel maps with a reference point

Size: px
Start display at page:

Download "Data visualization and dimensionality reduction using kernel maps with a reference point"

Transcription

1 Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/ Fax: 32/16/ [email protected] International Conference on Computational Harmonic Analysis Shanghai June 27 ICCHA 27 Shanghai Johan Suykens

2 Contents Context: support vector machines and kernel based learning Core problems: least squares support vector machines Classification and kernel principal component analysis Data visualization Kernel eigenmap methods Kernel maps with a reference point: linear system solution Examples ICCHA 27 Shanghai Johan Suykens 1

3 biomedical Living in a data world energy process industry bio-informatics multimedia traffic ICCHA 27 Shanghai Johan Suykens 2

4 Support vector machines and kernel methods: context With new technologies (e.g. in microarrays, proteomics) massive data sets become available that are high dimensional. Tasks and objectives: predictive modelling, knowledge discovery and integration, data fusion (classification, feature selection, prior knowledge incorporation, correlation analysis, ranking, robustness). Supervised, unsupervised or semi-supervised learning depending on the given data and problem. Need for modelling techniques that are able to operate on different data types (sequences, graphs, numerical, categorical,...) Linear as well as nonlinear models Reliable methods: numerically, computationally, statistically ICCHA 27 Shanghai Johan Suykens 3

5 Kernel based learning: interdisciplinary challenges neural networks data mining linear algebra pattern recognition mathematics SVM & Kernel Methods machine learning statistics optimization signal processing systems and control theory ICCHA 27 Shanghai Johan Suykens 4

6 Estimation in Reproducing Kernel Hilbert Spaces (RKHS) Variational problem: [Wahba, 199; Poggio & Girosi, 199; Evgeniou et al., 2] find function f such that min f H 1 N N L(y i, f(x i )) + λ f 2 K i=1 with L(, ) the loss function. f K is norm in RKHS H defined by K. Representer theorem: for convex loss function, solution of the form f(x) = NX α i K(x, x i ) i=1 Reproducing property f(x) = f, K x K with K x ( ) = K(x, ) Some special cases: L(y, f(x)) = (y f(x)) 2 : regularization network L(y, f(x)) = y f(x) ǫ : SVM regression with ǫ-insensitive loss function ε +ε ICCHA 27 Shanghai Johan Suykens 5

7 Different views on kernel based models SVM LS SVM Some early history on RKHS: Kriging RKHS Gaussian Processes : Moore 194: Aronszajn 1951: Krige 197: Parzen 1971: Kimeldorf & Wahba Obtaining complementary insights from different perspectives: kernels are used in different methodologies Support vector machines (SVM): optimization approach (primal/dual) Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis Gaussian processes (GP): probabilistic/bayesian approach ICCHA 27 Shanghai Johan Suykens 6

8 SVMs: living in two worlds... Primal space: x x x x x o o o o Feature space ϕ(x) x x x o oo o x x Input space y(x) = sign[w T ϕ(x) + b] Dual space: y(x) K(x i, x j ) = ϕ(x i ) T ϕ(x j ) ( Kernel trick ) y(x) = sign[ P #sv i=1 α iy i K(x, x i ) + b] y(x) w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x α #sv K(x, x #sv ) ICCHA 27 Shanghai Johan Suykens 7

9 Least Squares Support Vector Machines: core problems Regression (RR) min w,b,e wt w + γ i Classification (FDA) min w,b,e wt w + γ i e 2 i e 2 i s.t. y i = w T ϕ(x i ) + b + e i, i s.t. y i (w T ϕ(x i ) + b) = 1 e i, i Principal component analysis (PCA) min w,b,e wt w + γ i e 2 i s.t. e i = w T ϕ(x i ) + b, i Canonical correlation analysis/partial least squares (CCA/PLS) min w,v,b,d,e,r wt w+v T v+ν 1 e 2 i+ν 2 ri 2 γ { ei = w e i r i s.t. T ϕ 1 (x i ) + b r i = v T ϕ 2 (y i ) + d i i i partially linear models, spectral clustering, subspace algorithms,... ICCHA 27 Shanghai Johan Suykens 8

10 LS-SVM classifier Preserve support vector machine [Vapnik, 1995] methodology, but simplify via least squares and equality constraints [Suykens, 1999] Primal problem: min w,b,e 1 2 wt w + γ 1 2 N i=1 e 2 i such that y i [w T ϕ(x i ) + b]=1 e i, i = 1,...,N Dual problem: [ y T y Ω + I/γ ] [ b α ] = [ 1 N ] where Ω ij = y i y j ϕ(x i ) T ϕ(x j ) = y i y j K(x i, x j ) and y = [y 1 ;...;y N ]. LS-SVM classifiers perform very well on 2 UCI data sets [Van Gestel et al., ML 24] Winning results in competition WCCI 26 by [Cawley, 26] ICCHA 27 Shanghai Johan Suykens 9

11 Kernel PCA: primal and dual problem linear PCA kernel PCA (RBF kernel) Primal problem: [Suykens et al., 23] min 1 w,b,e 2 wt w + 1 N 2 γ i=1 e 2 i such that e i = w T ϕ(x i ) + b, i = 1,...,N. KPCA [Schölkopf et al., 1998]: Dual problem = kernel PCA: Ω c α = λα with λ = 1/γ with Ω c,ij = (ϕ(x i ) ˆµ ϕ ) T (ϕ(x j ) ˆµ ϕ ) the centered kernel matrix. Underlying LS-SVM model allows to make out-of-sample extensions. ICCHA 27 Shanghai Johan Suykens 1

12 Core models + additional constraints Monoticity constraints: [Pelckmans et al., 25] min w,b,e wt w + γ NX i=1 e 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,..., N) w T ϕ(x i ) w T ϕ(x i+1 ), (i = 1,..., N 1) Structure detection: [Pelckmans et al., 25; Tibshirani, 1996] min ρ X P w,e,t p=1 t p + PX w (p)t w (p) +γ p=1 NX i=1 e 2 i s.t. Autocorrelated errors: [Espinoza et al., 26] ( y i = P P p=1 w(p)t ϕ (p) (x (p) i ) + e i, ( i) t p w (p)t ϕ (p) (x (p) i ) t p, ( i, p) min w,b,r,e wt w + γ NX i=1 r 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,.., N) e i = ρe i 1 + r i, (i = 2,..., N) Spectral clustering: [Alzate & Suykens, 26; Chung, 1997; Shi & Malik, 2] min w,b,e wt w + γe T D 1 e s.t. e i = w T ϕ(x i ) + b, (i = 1,..., N) ICCHA 27 Shanghai Johan Suykens 11

13 Dimensionality reduction and data visualization Traditionally: commonly used techniques are e.g. principal component analysis, multidimensional scaling, self-organizing maps More recently: isomap, locally linear embedding, Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps ( kernel eigenmap methods and manifold learning ) [Roweis & Saul, 2; Coifman et al., 25; Belkin et al., 26] Relevant issues: - learning and generalization [Cucker & Smale, 22; Poggio et al., 24] - model representations and out-of-sample extensions - convex/non-convex problems, computational complexity [Smale, 1997] Kernel maps with reference point (KMref) [Suykens, 27]: data visualization and dimensionality reduction by solving linear system ICCHA 27 Shanghai Johan Suykens 12

14 x (3D given).2 x x x z z 1 x 1 3 (2D KMref result) ICCHA 27 Shanghai Johan Suykens 13

15 A criterion related to locally linear embedding Given training data set {x i } N i=1 with x i R p. Dimensionality reduction to {z i } N i=1 with z i R d (d = 2 or d = 3). Objective min z i R d γ 2 N z i i=1 N z i i=1 N s ij z j 2 2 j=1 where e.g. s ij = exp( x i x j 2 2/σ 2 ) Solution follows from eigenvalue problem Rz = γz with z = [z 1 ;z 2 ;...;z N ] and R = (I P) T (I P) where P = [s ij I d ]. ICCHA 27 Shanghai Johan Suykens 14

16 Introducing a core model Realize the nonlinear mapping x z through a least squares support vector machine regression: min z,w j,e i,j γ 2 zt z (z Pz)T (z Pz) + ν 2 d wj T w j + η 2 j=1 such that c T i,j z = wt j ϕ j(x i ) + e i,j, i = 1,...,N; j = 1,...,d N i=1 d j=1 e 2 i,j Primal model representation with evaluation at point x R p : ẑ,j = w T j ϕ j (x ) with w j R n h j and feature maps ϕ j ( ) : R p R n h j (j = 1,...,d) ICCHA 27 Shanghai Johan Suykens 15

17 Kernel maps and eigenvalue problem Solution follows from eigenvalue problem, e.g. for d = 2: ( R + V 1 ( 1 ν Ω η I) 1 V1 T + V 2 ( 1 ν Ω ) η I) 1 V2 T z = γz with kernel matrices Ω 1, Ω 2 : matrices V 1, V 2 : Ω 1,ij = K 1 (x i,x j ) = ϕ 1 (x i ) T ϕ 1 (x j ) Ω 2,ij = K 2 (x i,x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) V 1 = [c 1,1 c 2,1...c N,1 ],V 2 = [c 1,2 c 2,2... c N,2 ] However, selection of the best solution from this pool of 2N candidates is not straightforward (the best solution is not necessarily given by the largest or smallest eigenvalue here). ICCHA 27 Shanghai Johan Suykens 16

18 Kernel maps with reference point: problem statement Kernel maps with reference point: - LS-SVM core part: realize dimensionality reduction x z - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 1 min z,w 1,w 2,b 1,b 2,e i,1,e i,2 2 (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w T 2 w 2) + η 2 such that c T 1,1 z = q 1 + e 1,1 c T 1,2 z = q 2 + e 1,2 c T i,1 z = wt 1 ϕ 1(x i ) + b 1 + e i,1, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i,1 + e2 i,2 ) i=1 Coordinates in low dimensional space: z = [z 1 ;z 2 ;...;z N ] R dn Regularization term: (z P D z) T (z P D z) = P N i=1 z i P N j=1 s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ). ICCHA 27 Shanghai Johan Suykens 17

19 Kernel maps with reference point: solution The unique solution to the problem is given by the linear system U 1 T M1 1 V 1 T 1 T M 1 V 1 M V 2M T M2 1 2 T 1 T M 1 with matrices 2 1 z b 1 b 2 = η(q 1 c 1,1 + q 2 c 1,2 ) U = (I P D ) T (I P D ) γi + V 1 M 1 1 V T 1 + V 2 M 1 2 V T 2 + ηc 1,1 c T 1,1 + ηc 1,2 c T 1,2 M 1 = 1 ν Ω η I, M 2 = 1 ν Ω η I V 1 = [c 2,1...c N,1 ], V 2 = [c 2,2...c N,2 ] kernel matrices Ω 1,Ω 2 R (N 1) (N 1) : Ω 1,ij = K 1 (x i, x j ) = ϕ 1 (x i ) T ϕ 1 (x j ), Ω 2,ij = K 2 (x i, x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) positive definite kernel functions K 1 (, ), K 2 (, ). ICCHA 27 Shanghai Johan Suykens 18

20 Kernel maps with reference point: model representations The primal and dual model representations allow making out-ofsample extensions. Evaluation at point x R p : ẑ,1 = w T 1 ϕ 1 (x ) + b 1 = 1 ν ẑ,2 = w T 2 ϕ 2 (x ) + b 2 = 1 ν N α i,1 K 1 (x i,x ) + b 1 i=2 N α i,2 K 2 (x i,x ) + b 2 i=2 Estimated coordinates for visualization: ẑ = [ẑ,1 ; ẑ,2 ]. α 1,α 2 R N 1 are the unique solutions to the linear systems M 1 α 1 = V T 1 z b 1 1 N 1 and M 2 α 2 = V T 2 z b 2 1 N 1 and α 1 = [α 2,1 ;...;α N,1 ], α 2 = [α 2,2 ;...;α N,2 ], 1 N 1 = [1;1;...,;1]. ICCHA 27 Shanghai Johan Suykens 19

21 Proof - Lagrangian Only equality constraints: optimal model representation and solution is obtained in a systematic and straightforward way. Lagrangian: L(z,w 1, w 2,b 1,b 2,e i,1,e i,2 ; β 1,1, β 1,2, α i,1,α i,2 ) = γ 2 zt z (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w2 T w 2 )+ η N 2 i=1 (e2 i,1 + e2 i,2 ) + β 1,1(c T 1,1z q 1 e 1,1 ) + β 1,2 (c T 1,2z q 2 e 1,2 ) + N i=2 α i,1(c T i,1 z wt 1 ϕ 1 (x i ) b 1 e i,1 ) + N i=2 α i,2(c T i,2 z wt 2 ϕ 2 (x i ) b 2 e i,2 ) Conditions for optimality [Fletcher, 1987]: z =, w 1 =, w 2 =, β 1,1 =, =, =, b 1 b 2 =, α i,1 β 1,2 =, =, e 1,1 = α i,2 e 1,2 =, ICCHA 27 Shanghai Johan Suykens 2

22 Proof - conditions for optimality 8 >< >: z = γz + (I P D ) T (I P D )z + β 1,1 c 1,1 + β 1,2 c 1,2 + P N i=2 α i,1c i,1 + P N i=2 α i,2c i,2 = w = νw 1 P N 1 i=2 α i,1ϕ 1 (x i ) = w = νw 2 P N 2 i=2 α i,2ϕ 2 (x i ) = b = P N 1 i=2 α i,1 = 1 T N 1 α 1 = b = P N 2 i=2 α i,2 = 1 T N 1 α 2 = e = ηe 1,1 β 1,1 = 1,1 e 1,2 = ηe 1,2 β 1,2 = e i,1 = ηe i,1 α i,1 =, i = 2,..., N e i,2 = ηe i,2 α i,2 =, i = 2,..., N β 1,1 = c T 1,1 z q 1 e 1,1 = β 1,2 = c T 1,2 z q 2 e 1,2 = α i,1 = c T i,1 z w 1 T ϕ 1 (x i ) b 1 e i,1 =, i = 2,..., N α i,2 = c T i,2 z w 2 T ϕ 2 (x i ) b 2 e i,2 =, i = 2,..., N. ICCHA 27 Shanghai Johan Suykens 21

23 Eliminate w 1,w 2, e i,1, e i,2 Proof - elimination step Express in term of kernel functions Express set of equations in terms of z, b 1, b 2, α 1,α 2 One obtains γz+(i P D ) T (I P D )z+v 1 α 1 +V 2 α 2 +ηc 1,1 c T 1,1z+ηc 1,2 c T 1,2z = η(q 1 c 1,1 +q 2 c 1,2 ) and V T V T 1 z 1 ν Ω 1α 1 1 η α 1 b 1 1 N 1 = 2 z 1 ν Ω 2α 2 1 η α 2 b 2 1 N 1 = β 1,1 = η(c T 1,1z q 1 ) β 1,2 = η(c T 1,2z q 2 ). The dual model representation follows from the conditions for optimality. ICCHA 27 Shanghai Johan Suykens 22

24 Model selection by validation Model selection criterion: ( min Θ i,j ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Tuning parameters Θ: Kernels tuning parameters in s ij, K 1, K 2,(K 3 ) Regularization constants ν, η (take γ = ) Choice of the diagonal matrix D Choice of reference point q, e.g. q {[+1;+1], [+1; 1],[ 1; +1],[ 1, 1]} Stable results, finding a good range is satisfactory. ICCHA 27 Shanghai Johan Suykens 23

25 KMref: spiral example 2 x x 3 z x x z 1 training data (blue *), validation data (magenta o), test data (red +) Model selection: min i,j ( ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 ICCHA 27 Shanghai Johan Suykens 24

26 KMref: swiss roll example 3 x x z x x z 1 x 1 3 Given 3D swiss roll data KMref result - 2D projection 6 training data, 1 validation data ICCHA 27 Shanghai Johan Suykens 25

27 KMref: visualizing gene distributions x x z z x z z 1 x x 1 3 z 1 z x 1 3 KMref 3D projection (Alon colon cancer microarray data set) Dimension input space: 62 Number of genes: 15 (training: 5, validation: 5, test: 5) Model selection: σ 2 = 1 4, σ 2 1 = 1 3, σ 2 2 =.5σ 2 1, σ 2 3 =.1σ 2 1, η = 1, ν = 1, D = diag{1,5, 1}, q = [+1; 1; 1]. ICCHA 27 Shanghai Johan Suykens 26

28 KMref: Santa Fe laser data 3 x z discrete time k x 1 3 z 1 2 z x 1 3 original time-series {y t } t=t t=1 3D projection construct y t t m = [y t ; y t 1 ; y t 2 ;...;y t m ] with m = 9 given data {y t t m } t=m+n tot t=m+1 in a p = 1 dimensional space 2 validation data (first part), 7 training data points ICCHA 27 Shanghai Johan Suykens 27

29 Conclusions Trend: Kernelizing classical methods (FDA, PCA, CCA, ICA,...) Kernel methods: complementary views (LS-)SVM, RKHS, GP Least squares support vector machines as core problems in supervised and unsupervised learning, and beyond LS-SVM provides methodology for optimization modelling Kernel maps with reference point: LS-SVM core part Computational complexity: similar to regression/classification Reference point: converts eigenvalue problem into linear system Read more: Matlab demo file: ICCHA 27 Shanghai Johan Suykens 28

Kernel methods for exploratory data analysis and community detection

Kernel methods for exploratory data analysis and community detection Kernel methods for exploratory data analysis and community detection Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium Email: [email protected] http://www.esat.kuleuven.be/scd/

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Kernel methods for complex networks and big data

Kernel methods for complex networks and big data Kernel methods for complex networks and big data Johan Suykens KU Leuven, ESAT-STADIUS Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: [email protected] http://www.esat.kuleuven.be/stadius/

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Study on the Comparison of Electricity Forecasting Models: Korea and China Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

So which is the best?

So which is the best? Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

1 Spectral Methods for Dimensionality

1 Spectral Methods for Dimensionality 1 Spectral Methods for Dimensionality Reduction Lawrence K. Saul Kilian Q. Weinberger Fei Sha Jihun Ham Daniel D. Lee How can we search for low dimensional structure in high dimensional data? If the data

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Machine Learning in Statistical Arbitrage

Machine Learning in Statistical Arbitrage Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem

Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem Appendix A Rayleigh Ratios and the Courant-Fischer Theorem The most important property of symmetric matrices is that they have real eigenvalues and that they can be diagonalized with respect to an orthogonal

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Online learning of multi-class Support Vector Machines

Online learning of multi-class Support Vector Machines IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract

More information

A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard ([email protected]) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU 1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

GURLS: A Least Squares Library for Supervised Learning

GURLS: A Least Squares Library for Supervised Learning Journal of Machine Learning Research 14 (2013) 3201-3205 Submitted 1/12; Revised 2/13; Published 10/13 GURLS: A Least Squares Library for Supervised Learning Andrea Tacchetti Pavan K. Mallapragada Center

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

THE SVM APPROACH FOR BOX JENKINS MODELS

THE SVM APPROACH FOR BOX JENKINS MODELS REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Network Intrusion Detection using Semi Supervised Support Vector Machine

Network Intrusion Detection using Semi Supervised Support Vector Machine Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Quantifying Seasonal Variation in Cloud Cover with Predictive Models

Quantifying Seasonal Variation in Cloud Cover with Predictive Models Quantifying Seasonal Variation in Cloud Cover with Predictive Models Ashok N. Srivastava, Ph.D. [email protected] Deputy Area Lead, Discovery and Systems Health Group Leader, Intelligent Data Understanding

More information

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2

More information

Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

More information

/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization.

/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization. echnische Universiteit Eindhoven Faculteit Elektrotechniek NIE-LINEAIRE SYSEMEN / NEURALE NEWERKEN (P6) gehouden op donderdag maart 7, van 9: tot : uur. Dit examenonderdeel bestaat uit 8 opgaven. /SOLUIONS/

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected]

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected] What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

More information

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information