Data visualization and dimensionality reduction using kernel maps with a reference point

Similar documents

Kernel methods for exploratory data analysis and community detection

Introduction to Support Vector Machines. Colin Campbell, Bristol University

An Introduction to Machine Learning

Statistical Machine Learning

Kernel methods for complex networks and big data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Support Vector Machines with Clustering for Training with Very Large Datasets

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Support Vector Machines for Classification and Regression

Supervised Feature Selection & Unsupervised Dimensionality Reduction

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Linear Classification. Volker Tresp Summer 2015

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Introduction: Overview of Kernel Methods

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Simple Introduction to Support Vector Machines

So which is the best?

Lecture 3: Linear methods for classification

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Support Vector Machines

Machine Learning and Pattern Recognition Logistic Regression

1 Spectral Methods for Dimensionality

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Machine Learning in Statistical Arbitrage

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Support Vector Machine (SVM)

HT2015: SC4 Statistical Data Mining and Machine Learning

Unsupervised and supervised dimension reduction: Algorithms and connections

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Penalized Logistic Regression and Classification of Microarray Data

Early defect identification of semiconductor processes using machine learning

Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Support Vector Machines Explained

Manifold Learning Examples PCA, LLE and ISOMAP

Online learning of multi-class Support Vector Machines

A Computational Framework for Exploratory Data Analysis

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Maximum Margin Clustering

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Machine Learning in FX Carry Basket Prediction

ADVANCED MACHINE LEARNING. Introduction

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

GURLS: A Least Squares Library for Supervised Learning

: Introduction to Machine Learning Dr. Rita Osadchy

Sketch As a Tool for Numerical Linear Algebra

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Data, Measurements, Features

Linear Threshold Units

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Lecture 2: The SVM classifier

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

THE SVM APPROACH FOR BOX JENKINS MODELS

Several Views of Support Vector Machines

Visualization by Linear Projections as Information Retrieval

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Learning outcomes. Knowledge and understanding. Competence and skills

Component Ordering in Independent Component Analysis Based on Data Power

Lecture 6: Logistic Regression

Machine Learning in Spam Filtering

Statistical Machine Learning from Data

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Knowledge Discovery from patents using KMX Text Analytics

Network Intrusion Detection using Semi Supervised Support Vector Machine

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Learning with Local and Global Consistency

Learning with Local and Global Consistency

Inner Product Spaces

Quantifying Seasonal Variation in Cloud Cover with Predictive Models

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

Data Mining and Machine Learning in Bioinformatics

/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization.

DATA ANALYSIS II. Matrix Algorithms

Exploratory Data Analysis with MATLAB

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

QUALITY ENGINEERING PROGRAM

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Simple and efficient online algorithms for real world applications

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Transcription:

Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18 2 - Fax: 32/16/32 19 7 Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/scd/ International Conference on Computational Harmonic Analysis Shanghai June 27 ICCHA 27 Shanghai Johan Suykens

Contents Context: support vector machines and kernel based learning Core problems: least squares support vector machines Classification and kernel principal component analysis Data visualization Kernel eigenmap methods Kernel maps with a reference point: linear system solution Examples ICCHA 27 Shanghai Johan Suykens 1

biomedical Living in a data world energy process industry bio-informatics multimedia traffic ICCHA 27 Shanghai Johan Suykens 2

Support vector machines and kernel methods: context With new technologies (e.g. in microarrays, proteomics) massive data sets become available that are high dimensional. Tasks and objectives: predictive modelling, knowledge discovery and integration, data fusion (classification, feature selection, prior knowledge incorporation, correlation analysis, ranking, robustness). Supervised, unsupervised or semi-supervised learning depending on the given data and problem. Need for modelling techniques that are able to operate on different data types (sequences, graphs, numerical, categorical,...) Linear as well as nonlinear models Reliable methods: numerically, computationally, statistically ICCHA 27 Shanghai Johan Suykens 3

Kernel based learning: interdisciplinary challenges neural networks data mining linear algebra pattern recognition mathematics SVM & Kernel Methods machine learning statistics optimization signal processing systems and control theory ICCHA 27 Shanghai Johan Suykens 4

Estimation in Reproducing Kernel Hilbert Spaces (RKHS) Variational problem: [Wahba, 199; Poggio & Girosi, 199; Evgeniou et al., 2] find function f such that min f H 1 N N L(y i, f(x i )) + λ f 2 K i=1 with L(, ) the loss function. f K is norm in RKHS H defined by K. Representer theorem: for convex loss function, solution of the form f(x) = NX α i K(x, x i ) i=1 Reproducing property f(x) = f, K x K with K x ( ) = K(x, ) Some special cases: L(y, f(x)) = (y f(x)) 2 : regularization network L(y, f(x)) = y f(x) ǫ : SVM regression with ǫ-insensitive loss function ε +ε ICCHA 27 Shanghai Johan Suykens 5

Different views on kernel based models SVM LS SVM Some early history on RKHS: Kriging RKHS Gaussian Processes 191-192: Moore 194: Aronszajn 1951: Krige 197: Parzen 1971: Kimeldorf & Wahba Obtaining complementary insights from different perspectives: kernels are used in different methodologies Support vector machines (SVM): optimization approach (primal/dual) Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis Gaussian processes (GP): probabilistic/bayesian approach ICCHA 27 Shanghai Johan Suykens 6

SVMs: living in two worlds... Primal space: x x x x x o o o o Feature space ϕ(x) x x x o oo o x x Input space y(x) = sign[w T ϕ(x) + b] Dual space: y(x) K(x i, x j ) = ϕ(x i ) T ϕ(x j ) ( Kernel trick ) y(x) = sign[ P #sv i=1 α iy i K(x, x i ) + b] y(x) w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x α #sv K(x, x #sv ) ICCHA 27 Shanghai Johan Suykens 7

Least Squares Support Vector Machines: core problems Regression (RR) min w,b,e wt w + γ i Classification (FDA) min w,b,e wt w + γ i e 2 i e 2 i s.t. y i = w T ϕ(x i ) + b + e i, i s.t. y i (w T ϕ(x i ) + b) = 1 e i, i Principal component analysis (PCA) min w,b,e wt w + γ i e 2 i s.t. e i = w T ϕ(x i ) + b, i Canonical correlation analysis/partial least squares (CCA/PLS) min w,v,b,d,e,r wt w+v T v+ν 1 e 2 i+ν 2 ri 2 γ { ei = w e i r i s.t. T ϕ 1 (x i ) + b r i = v T ϕ 2 (y i ) + d i i i partially linear models, spectral clustering, subspace algorithms,... ICCHA 27 Shanghai Johan Suykens 8

LS-SVM classifier Preserve support vector machine [Vapnik, 1995] methodology, but simplify via least squares and equality constraints [Suykens, 1999] Primal problem: min w,b,e 1 2 wt w + γ 1 2 N i=1 e 2 i such that y i [w T ϕ(x i ) + b]=1 e i, i = 1,...,N Dual problem: [ y T y Ω + I/γ ] [ b α ] = [ 1 N ] where Ω ij = y i y j ϕ(x i ) T ϕ(x j ) = y i y j K(x i, x j ) and y = [y 1 ;...;y N ]. LS-SVM classifiers perform very well on 2 UCI data sets [Van Gestel et al., ML 24] Winning results in competition WCCI 26 by [Cawley, 26] ICCHA 27 Shanghai Johan Suykens 9

Kernel PCA: primal and dual problem 1.5 1.5 1 1.5.5.5 1 1.5.5 1 2 1.5 2.5 1.5 1.5.5 1 2 1.5 1.5.5 1 linear PCA kernel PCA (RBF kernel) Primal problem: [Suykens et al., 23] min 1 w,b,e 2 wt w + 1 N 2 γ i=1 e 2 i such that e i = w T ϕ(x i ) + b, i = 1,...,N. KPCA [Schölkopf et al., 1998]: Dual problem = kernel PCA: Ω c α = λα with λ = 1/γ with Ω c,ij = (ϕ(x i ) ˆµ ϕ ) T (ϕ(x j ) ˆµ ϕ ) the centered kernel matrix. Underlying LS-SVM model allows to make out-of-sample extensions. ICCHA 27 Shanghai Johan Suykens 1

Core models + additional constraints Monoticity constraints: [Pelckmans et al., 25] min w,b,e wt w + γ NX i=1 e 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,..., N) w T ϕ(x i ) w T ϕ(x i+1 ), (i = 1,..., N 1) Structure detection: [Pelckmans et al., 25; Tibshirani, 1996] min ρ X P w,e,t p=1 t p + PX w (p)t w (p) +γ p=1 NX i=1 e 2 i s.t. Autocorrelated errors: [Espinoza et al., 26] ( y i = P P p=1 w(p)t ϕ (p) (x (p) i ) + e i, ( i) t p w (p)t ϕ (p) (x (p) i ) t p, ( i, p) min w,b,r,e wt w + γ NX i=1 r 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,.., N) e i = ρe i 1 + r i, (i = 2,..., N) Spectral clustering: [Alzate & Suykens, 26; Chung, 1997; Shi & Malik, 2] min w,b,e wt w + γe T D 1 e s.t. e i = w T ϕ(x i ) + b, (i = 1,..., N) ICCHA 27 Shanghai Johan Suykens 11

Dimensionality reduction and data visualization Traditionally: commonly used techniques are e.g. principal component analysis, multidimensional scaling, self-organizing maps More recently: isomap, locally linear embedding, Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps ( kernel eigenmap methods and manifold learning ) [Roweis & Saul, 2; Coifman et al., 25; Belkin et al., 26] Relevant issues: - learning and generalization [Cucker & Smale, 22; Poggio et al., 24] - model representations and out-of-sample extensions - convex/non-convex problems, computational complexity [Smale, 1997] Kernel maps with reference point (KMref) [Suykens, 27]: data visualization and dimensionality reduction by solving linear system ICCHA 27 Shanghai Johan Suykens 12

.6.4.2 x 3.2.4.6.8.6.4.2 (3D given).2 x 2.4.6.8 1.5 x 1.5 3 x 1 3 2.5 1 2 z 2 1.5 1.5 3.5 3 2.5 2 1.5 1.5 z 1 x 1 3 (2D KMref result) ICCHA 27 Shanghai Johan Suykens 13

A criterion related to locally linear embedding Given training data set {x i } N i=1 with x i R p. Dimensionality reduction to {z i } N i=1 with z i R d (d = 2 or d = 3). Objective min z i R d γ 2 N z i 2 2 + 1 2 i=1 N z i i=1 N s ij z j 2 2 j=1 where e.g. s ij = exp( x i x j 2 2/σ 2 ) Solution follows from eigenvalue problem Rz = γz with z = [z 1 ;z 2 ;...;z N ] and R = (I P) T (I P) where P = [s ij I d ]. ICCHA 27 Shanghai Johan Suykens 14

Introducing a core model Realize the nonlinear mapping x z through a least squares support vector machine regression: min z,w j,e i,j γ 2 zt z + 1 2 (z Pz)T (z Pz) + ν 2 d wj T w j + η 2 j=1 such that c T i,j z = wt j ϕ j(x i ) + e i,j, i = 1,...,N; j = 1,...,d N i=1 d j=1 e 2 i,j Primal model representation with evaluation at point x R p : ẑ,j = w T j ϕ j (x ) with w j R n h j and feature maps ϕ j ( ) : R p R n h j (j = 1,...,d) ICCHA 27 Shanghai Johan Suykens 15

Kernel maps and eigenvalue problem Solution follows from eigenvalue problem, e.g. for d = 2: ( R + V 1 ( 1 ν Ω 1 + 1 η I) 1 V1 T + V 2 ( 1 ν Ω 2 + 1 ) η I) 1 V2 T z = γz with kernel matrices Ω 1, Ω 2 : matrices V 1, V 2 : Ω 1,ij = K 1 (x i,x j ) = ϕ 1 (x i ) T ϕ 1 (x j ) Ω 2,ij = K 2 (x i,x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) V 1 = [c 1,1 c 2,1...c N,1 ],V 2 = [c 1,2 c 2,2... c N,2 ] However, selection of the best solution from this pool of 2N candidates is not straightforward (the best solution is not necessarily given by the largest or smallest eigenvalue here). ICCHA 27 Shanghai Johan Suykens 16

Kernel maps with reference point: problem statement Kernel maps with reference point: - LS-SVM core part: realize dimensionality reduction x z - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 1 min z,w 1,w 2,b 1,b 2,e i,1,e i,2 2 (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w T 2 w 2) + η 2 such that c T 1,1 z = q 1 + e 1,1 c T 1,2 z = q 2 + e 1,2 c T i,1 z = wt 1 ϕ 1(x i ) + b 1 + e i,1, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i,1 + e2 i,2 ) i=1 Coordinates in low dimensional space: z = [z 1 ;z 2 ;...;z N ] R dn Regularization term: (z P D z) T (z P D z) = P N i=1 z i P N j=1 s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ). ICCHA 27 Shanghai Johan Suykens 17

Kernel maps with reference point: solution The unique solution to the problem is given by the linear system U 1 T M1 1 V 1 T 1 T M 1 V 1 M 1 1 1 V 2M 1 2 1 1 1 1 T M2 1 2 T 1 T M 1 with matrices 2 1 z b 1 b 2 = η(q 1 c 1,1 + q 2 c 1,2 ) U = (I P D ) T (I P D ) γi + V 1 M 1 1 V T 1 + V 2 M 1 2 V T 2 + ηc 1,1 c T 1,1 + ηc 1,2 c T 1,2 M 1 = 1 ν Ω 1 + 1 η I, M 2 = 1 ν Ω 2 + 1 η I V 1 = [c 2,1...c N,1 ], V 2 = [c 2,2...c N,2 ] kernel matrices Ω 1,Ω 2 R (N 1) (N 1) : Ω 1,ij = K 1 (x i, x j ) = ϕ 1 (x i ) T ϕ 1 (x j ), Ω 2,ij = K 2 (x i, x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) positive definite kernel functions K 1 (, ), K 2 (, ). ICCHA 27 Shanghai Johan Suykens 18

Kernel maps with reference point: model representations The primal and dual model representations allow making out-ofsample extensions. Evaluation at point x R p : ẑ,1 = w T 1 ϕ 1 (x ) + b 1 = 1 ν ẑ,2 = w T 2 ϕ 2 (x ) + b 2 = 1 ν N α i,1 K 1 (x i,x ) + b 1 i=2 N α i,2 K 2 (x i,x ) + b 2 i=2 Estimated coordinates for visualization: ẑ = [ẑ,1 ; ẑ,2 ]. α 1,α 2 R N 1 are the unique solutions to the linear systems M 1 α 1 = V T 1 z b 1 1 N 1 and M 2 α 2 = V T 2 z b 2 1 N 1 and α 1 = [α 2,1 ;...;α N,1 ], α 2 = [α 2,2 ;...;α N,2 ], 1 N 1 = [1;1;...,;1]. ICCHA 27 Shanghai Johan Suykens 19

Proof - Lagrangian Only equality constraints: optimal model representation and solution is obtained in a systematic and straightforward way. Lagrangian: L(z,w 1, w 2,b 1,b 2,e i,1,e i,2 ; β 1,1, β 1,2, α i,1,α i,2 ) = γ 2 zt z + 1 2 (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w2 T w 2 )+ η N 2 i=1 (e2 i,1 + e2 i,2 ) + β 1,1(c T 1,1z q 1 e 1,1 ) + β 1,2 (c T 1,2z q 2 e 1,2 ) + N i=2 α i,1(c T i,1 z wt 1 ϕ 1 (x i ) b 1 e i,1 ) + N i=2 α i,2(c T i,2 z wt 2 ϕ 2 (x i ) b 2 e i,2 ) Conditions for optimality [Fletcher, 1987]: z =, w 1 =, w 2 =, β 1,1 =, =, =, b 1 b 2 =, α i,1 β 1,2 =, =, e 1,1 = α i,2 e 1,2 =, ICCHA 27 Shanghai Johan Suykens 2

Proof - conditions for optimality 8 >< >: z = γz + (I P D ) T (I P D )z + β 1,1 c 1,1 + β 1,2 c 1,2 + P N i=2 α i,1c i,1 + P N i=2 α i,2c i,2 = w = νw 1 P N 1 i=2 α i,1ϕ 1 (x i ) = w = νw 2 P N 2 i=2 α i,2ϕ 2 (x i ) = b = P N 1 i=2 α i,1 = 1 T N 1 α 1 = b = P N 2 i=2 α i,2 = 1 T N 1 α 2 = e = ηe 1,1 β 1,1 = 1,1 e 1,2 = ηe 1,2 β 1,2 = e i,1 = ηe i,1 α i,1 =, i = 2,..., N e i,2 = ηe i,2 α i,2 =, i = 2,..., N β 1,1 = c T 1,1 z q 1 e 1,1 = β 1,2 = c T 1,2 z q 2 e 1,2 = α i,1 = c T i,1 z w 1 T ϕ 1 (x i ) b 1 e i,1 =, i = 2,..., N α i,2 = c T i,2 z w 2 T ϕ 2 (x i ) b 2 e i,2 =, i = 2,..., N. ICCHA 27 Shanghai Johan Suykens 21

Eliminate w 1,w 2, e i,1, e i,2 Proof - elimination step Express in term of kernel functions Express set of equations in terms of z, b 1, b 2, α 1,α 2 One obtains γz+(i P D ) T (I P D )z+v 1 α 1 +V 2 α 2 +ηc 1,1 c T 1,1z+ηc 1,2 c T 1,2z = η(q 1 c 1,1 +q 2 c 1,2 ) and V T V T 1 z 1 ν Ω 1α 1 1 η α 1 b 1 1 N 1 = 2 z 1 ν Ω 2α 2 1 η α 2 b 2 1 N 1 = β 1,1 = η(c T 1,1z q 1 ) β 1,2 = η(c T 1,2z q 2 ). The dual model representation follows from the conditions for optimality. ICCHA 27 Shanghai Johan Suykens 22

Model selection by validation Model selection criterion: ( min Θ i,j ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Tuning parameters Θ: Kernels tuning parameters in s ij, K 1, K 2,(K 3 ) Regularization constants ν, η (take γ = ) Choice of the diagonal matrix D Choice of reference point q, e.g. q {[+1;+1], [+1; 1],[ 1; +1],[ 1, 1]} Stable results, finding a good range is satisfactory. ICCHA 27 Shanghai Johan Suykens 23

KMref: spiral example 2 x 1 3.5 15 1 x 3 z 2 5.5 1.5.5 x 2 1 1.5 1.5 1.5 x 1.5 1 5.2.15.1.5.5.1 z 1 training data (blue *), validation data (magenta o), test data (red +) Model selection: min i,j ( ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 ICCHA 27 Shanghai Johan Suykens 24

KMref: swiss roll example 3 x 1 3.6 2.5.4 x 3.2.2.4.6 z 2 2 1.5.8.6 1.4.2.2 x 2.4.6.8 1.5 x 1.5 1.5 3.5 3 2.5 2 1.5 1.5 z 1 x 1 3 Given 3D swiss roll data KMref result - 2D projection 6 training data, 1 validation data ICCHA 27 Shanghai Johan Suykens 25

KMref: visualizing gene distributions x 1 3 2.1 x 1 3 2 2.1 2 z 3 1.9 z 3 1.9 1.8 1.8 1.7 2.3 x 1 3 2.2 2.1 z 2 2 1.9 2.35 2.3 2.25 2.2 2.15 2.1 2.5 2 1.95 z 1 x 1 3 1.7 2.35 2.3 2.25 2.2 2.15 2.1 2.5 2 1.95 1.9 x 1 3 z 1 z 2 2 2.1 2.2 2.3 x 1 3 KMref 3D projection (Alon colon cancer microarray data set) Dimension input space: 62 Number of genes: 15 (training: 5, validation: 5, test: 5) Model selection: σ 2 = 1 4, σ 2 1 = 1 3, σ 2 2 =.5σ 2 1, σ 2 3 =.1σ 2 1, η = 1, ν = 1, D = diag{1,5, 1}, q = [+1; 1; 1]. ICCHA 27 Shanghai Johan Suykens 26

KMref: Santa Fe laser data 3 x 1 3 25 2.5 2 2 1.5 15 z 3 1 1.5 5 2 3 4 5 6 7 8 9 discrete time k.5 4 3.5 3 2.5 2 1.5 1.5.5 x 1 3 z 1 2 z 2 2 4 x 1 3 original time-series {y t } t=t t=1 3D projection construct y t t m = [y t ; y t 1 ; y t 2 ;...;y t m ] with m = 9 given data {y t t m } t=m+n tot t=m+1 in a p = 1 dimensional space 2 validation data (first part), 7 training data points ICCHA 27 Shanghai Johan Suykens 27

Conclusions Trend: Kernelizing classical methods (FDA, PCA, CCA, ICA,...) Kernel methods: complementary views (LS-)SVM, RKHS, GP Least squares support vector machines as core problems in supervised and unsupervised learning, and beyond LS-SVM provides methodology for optimization modelling Kernel maps with reference point: LS-SVM core part Computational complexity: similar to regression/classification Reference point: converts eigenvalue problem into linear system Read more: http://www.esat.kuleuven.be/sista/lssvmlab/kmref/kmref722.pdf Matlab demo file: http://www.esat.kuleuven.be/sista/lssvmlab/kmref/demoswisskmref.m ICCHA 27 Shanghai Johan Suykens 28