Data visualization and dimensionality reduction using kernel maps with a reference point
|
|
|
- Opal Rich
- 10 years ago
- Views:
Transcription
1 Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/ Fax: 32/16/ [email protected] International Conference on Computational Harmonic Analysis Shanghai June 27 ICCHA 27 Shanghai Johan Suykens
2 Contents Context: support vector machines and kernel based learning Core problems: least squares support vector machines Classification and kernel principal component analysis Data visualization Kernel eigenmap methods Kernel maps with a reference point: linear system solution Examples ICCHA 27 Shanghai Johan Suykens 1
3 biomedical Living in a data world energy process industry bio-informatics multimedia traffic ICCHA 27 Shanghai Johan Suykens 2
4 Support vector machines and kernel methods: context With new technologies (e.g. in microarrays, proteomics) massive data sets become available that are high dimensional. Tasks and objectives: predictive modelling, knowledge discovery and integration, data fusion (classification, feature selection, prior knowledge incorporation, correlation analysis, ranking, robustness). Supervised, unsupervised or semi-supervised learning depending on the given data and problem. Need for modelling techniques that are able to operate on different data types (sequences, graphs, numerical, categorical,...) Linear as well as nonlinear models Reliable methods: numerically, computationally, statistically ICCHA 27 Shanghai Johan Suykens 3
5 Kernel based learning: interdisciplinary challenges neural networks data mining linear algebra pattern recognition mathematics SVM & Kernel Methods machine learning statistics optimization signal processing systems and control theory ICCHA 27 Shanghai Johan Suykens 4
6 Estimation in Reproducing Kernel Hilbert Spaces (RKHS) Variational problem: [Wahba, 199; Poggio & Girosi, 199; Evgeniou et al., 2] find function f such that min f H 1 N N L(y i, f(x i )) + λ f 2 K i=1 with L(, ) the loss function. f K is norm in RKHS H defined by K. Representer theorem: for convex loss function, solution of the form f(x) = NX α i K(x, x i ) i=1 Reproducing property f(x) = f, K x K with K x ( ) = K(x, ) Some special cases: L(y, f(x)) = (y f(x)) 2 : regularization network L(y, f(x)) = y f(x) ǫ : SVM regression with ǫ-insensitive loss function ε +ε ICCHA 27 Shanghai Johan Suykens 5
7 Different views on kernel based models SVM LS SVM Some early history on RKHS: Kriging RKHS Gaussian Processes : Moore 194: Aronszajn 1951: Krige 197: Parzen 1971: Kimeldorf & Wahba Obtaining complementary insights from different perspectives: kernels are used in different methodologies Support vector machines (SVM): optimization approach (primal/dual) Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis Gaussian processes (GP): probabilistic/bayesian approach ICCHA 27 Shanghai Johan Suykens 6
8 SVMs: living in two worlds... Primal space: x x x x x o o o o Feature space ϕ(x) x x x o oo o x x Input space y(x) = sign[w T ϕ(x) + b] Dual space: y(x) K(x i, x j ) = ϕ(x i ) T ϕ(x j ) ( Kernel trick ) y(x) = sign[ P #sv i=1 α iy i K(x, x i ) + b] y(x) w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x α #sv K(x, x #sv ) ICCHA 27 Shanghai Johan Suykens 7
9 Least Squares Support Vector Machines: core problems Regression (RR) min w,b,e wt w + γ i Classification (FDA) min w,b,e wt w + γ i e 2 i e 2 i s.t. y i = w T ϕ(x i ) + b + e i, i s.t. y i (w T ϕ(x i ) + b) = 1 e i, i Principal component analysis (PCA) min w,b,e wt w + γ i e 2 i s.t. e i = w T ϕ(x i ) + b, i Canonical correlation analysis/partial least squares (CCA/PLS) min w,v,b,d,e,r wt w+v T v+ν 1 e 2 i+ν 2 ri 2 γ { ei = w e i r i s.t. T ϕ 1 (x i ) + b r i = v T ϕ 2 (y i ) + d i i i partially linear models, spectral clustering, subspace algorithms,... ICCHA 27 Shanghai Johan Suykens 8
10 LS-SVM classifier Preserve support vector machine [Vapnik, 1995] methodology, but simplify via least squares and equality constraints [Suykens, 1999] Primal problem: min w,b,e 1 2 wt w + γ 1 2 N i=1 e 2 i such that y i [w T ϕ(x i ) + b]=1 e i, i = 1,...,N Dual problem: [ y T y Ω + I/γ ] [ b α ] = [ 1 N ] where Ω ij = y i y j ϕ(x i ) T ϕ(x j ) = y i y j K(x i, x j ) and y = [y 1 ;...;y N ]. LS-SVM classifiers perform very well on 2 UCI data sets [Van Gestel et al., ML 24] Winning results in competition WCCI 26 by [Cawley, 26] ICCHA 27 Shanghai Johan Suykens 9
11 Kernel PCA: primal and dual problem linear PCA kernel PCA (RBF kernel) Primal problem: [Suykens et al., 23] min 1 w,b,e 2 wt w + 1 N 2 γ i=1 e 2 i such that e i = w T ϕ(x i ) + b, i = 1,...,N. KPCA [Schölkopf et al., 1998]: Dual problem = kernel PCA: Ω c α = λα with λ = 1/γ with Ω c,ij = (ϕ(x i ) ˆµ ϕ ) T (ϕ(x j ) ˆµ ϕ ) the centered kernel matrix. Underlying LS-SVM model allows to make out-of-sample extensions. ICCHA 27 Shanghai Johan Suykens 1
12 Core models + additional constraints Monoticity constraints: [Pelckmans et al., 25] min w,b,e wt w + γ NX i=1 e 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,..., N) w T ϕ(x i ) w T ϕ(x i+1 ), (i = 1,..., N 1) Structure detection: [Pelckmans et al., 25; Tibshirani, 1996] min ρ X P w,e,t p=1 t p + PX w (p)t w (p) +γ p=1 NX i=1 e 2 i s.t. Autocorrelated errors: [Espinoza et al., 26] ( y i = P P p=1 w(p)t ϕ (p) (x (p) i ) + e i, ( i) t p w (p)t ϕ (p) (x (p) i ) t p, ( i, p) min w,b,r,e wt w + γ NX i=1 r 2 i s.t. j yi = w T ϕ(x i ) + b + e i, (i = 1,.., N) e i = ρe i 1 + r i, (i = 2,..., N) Spectral clustering: [Alzate & Suykens, 26; Chung, 1997; Shi & Malik, 2] min w,b,e wt w + γe T D 1 e s.t. e i = w T ϕ(x i ) + b, (i = 1,..., N) ICCHA 27 Shanghai Johan Suykens 11
13 Dimensionality reduction and data visualization Traditionally: commonly used techniques are e.g. principal component analysis, multidimensional scaling, self-organizing maps More recently: isomap, locally linear embedding, Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps ( kernel eigenmap methods and manifold learning ) [Roweis & Saul, 2; Coifman et al., 25; Belkin et al., 26] Relevant issues: - learning and generalization [Cucker & Smale, 22; Poggio et al., 24] - model representations and out-of-sample extensions - convex/non-convex problems, computational complexity [Smale, 1997] Kernel maps with reference point (KMref) [Suykens, 27]: data visualization and dimensionality reduction by solving linear system ICCHA 27 Shanghai Johan Suykens 12
14 x (3D given).2 x x x z z 1 x 1 3 (2D KMref result) ICCHA 27 Shanghai Johan Suykens 13
15 A criterion related to locally linear embedding Given training data set {x i } N i=1 with x i R p. Dimensionality reduction to {z i } N i=1 with z i R d (d = 2 or d = 3). Objective min z i R d γ 2 N z i i=1 N z i i=1 N s ij z j 2 2 j=1 where e.g. s ij = exp( x i x j 2 2/σ 2 ) Solution follows from eigenvalue problem Rz = γz with z = [z 1 ;z 2 ;...;z N ] and R = (I P) T (I P) where P = [s ij I d ]. ICCHA 27 Shanghai Johan Suykens 14
16 Introducing a core model Realize the nonlinear mapping x z through a least squares support vector machine regression: min z,w j,e i,j γ 2 zt z (z Pz)T (z Pz) + ν 2 d wj T w j + η 2 j=1 such that c T i,j z = wt j ϕ j(x i ) + e i,j, i = 1,...,N; j = 1,...,d N i=1 d j=1 e 2 i,j Primal model representation with evaluation at point x R p : ẑ,j = w T j ϕ j (x ) with w j R n h j and feature maps ϕ j ( ) : R p R n h j (j = 1,...,d) ICCHA 27 Shanghai Johan Suykens 15
17 Kernel maps and eigenvalue problem Solution follows from eigenvalue problem, e.g. for d = 2: ( R + V 1 ( 1 ν Ω η I) 1 V1 T + V 2 ( 1 ν Ω ) η I) 1 V2 T z = γz with kernel matrices Ω 1, Ω 2 : matrices V 1, V 2 : Ω 1,ij = K 1 (x i,x j ) = ϕ 1 (x i ) T ϕ 1 (x j ) Ω 2,ij = K 2 (x i,x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) V 1 = [c 1,1 c 2,1...c N,1 ],V 2 = [c 1,2 c 2,2... c N,2 ] However, selection of the best solution from this pool of 2N candidates is not straightforward (the best solution is not necessarily given by the largest or smallest eigenvalue here). ICCHA 27 Shanghai Johan Suykens 16
18 Kernel maps with reference point: problem statement Kernel maps with reference point: - LS-SVM core part: realize dimensionality reduction x z - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 1 min z,w 1,w 2,b 1,b 2,e i,1,e i,2 2 (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w T 2 w 2) + η 2 such that c T 1,1 z = q 1 + e 1,1 c T 1,2 z = q 2 + e 1,2 c T i,1 z = wt 1 ϕ 1(x i ) + b 1 + e i,1, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i,1 + e2 i,2 ) i=1 Coordinates in low dimensional space: z = [z 1 ;z 2 ;...;z N ] R dn Regularization term: (z P D z) T (z P D z) = P N i=1 z i P N j=1 s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ). ICCHA 27 Shanghai Johan Suykens 17
19 Kernel maps with reference point: solution The unique solution to the problem is given by the linear system U 1 T M1 1 V 1 T 1 T M 1 V 1 M V 2M T M2 1 2 T 1 T M 1 with matrices 2 1 z b 1 b 2 = η(q 1 c 1,1 + q 2 c 1,2 ) U = (I P D ) T (I P D ) γi + V 1 M 1 1 V T 1 + V 2 M 1 2 V T 2 + ηc 1,1 c T 1,1 + ηc 1,2 c T 1,2 M 1 = 1 ν Ω η I, M 2 = 1 ν Ω η I V 1 = [c 2,1...c N,1 ], V 2 = [c 2,2...c N,2 ] kernel matrices Ω 1,Ω 2 R (N 1) (N 1) : Ω 1,ij = K 1 (x i, x j ) = ϕ 1 (x i ) T ϕ 1 (x j ), Ω 2,ij = K 2 (x i, x j ) = ϕ 2 (x i ) T ϕ 2 (x j ) positive definite kernel functions K 1 (, ), K 2 (, ). ICCHA 27 Shanghai Johan Suykens 18
20 Kernel maps with reference point: model representations The primal and dual model representations allow making out-ofsample extensions. Evaluation at point x R p : ẑ,1 = w T 1 ϕ 1 (x ) + b 1 = 1 ν ẑ,2 = w T 2 ϕ 2 (x ) + b 2 = 1 ν N α i,1 K 1 (x i,x ) + b 1 i=2 N α i,2 K 2 (x i,x ) + b 2 i=2 Estimated coordinates for visualization: ẑ = [ẑ,1 ; ẑ,2 ]. α 1,α 2 R N 1 are the unique solutions to the linear systems M 1 α 1 = V T 1 z b 1 1 N 1 and M 2 α 2 = V T 2 z b 2 1 N 1 and α 1 = [α 2,1 ;...;α N,1 ], α 2 = [α 2,2 ;...;α N,2 ], 1 N 1 = [1;1;...,;1]. ICCHA 27 Shanghai Johan Suykens 19
21 Proof - Lagrangian Only equality constraints: optimal model representation and solution is obtained in a systematic and straightforward way. Lagrangian: L(z,w 1, w 2,b 1,b 2,e i,1,e i,2 ; β 1,1, β 1,2, α i,1,α i,2 ) = γ 2 zt z (z P Dz) T (z P D z) + ν 2 (wt 1 w 1 + w2 T w 2 )+ η N 2 i=1 (e2 i,1 + e2 i,2 ) + β 1,1(c T 1,1z q 1 e 1,1 ) + β 1,2 (c T 1,2z q 2 e 1,2 ) + N i=2 α i,1(c T i,1 z wt 1 ϕ 1 (x i ) b 1 e i,1 ) + N i=2 α i,2(c T i,2 z wt 2 ϕ 2 (x i ) b 2 e i,2 ) Conditions for optimality [Fletcher, 1987]: z =, w 1 =, w 2 =, β 1,1 =, =, =, b 1 b 2 =, α i,1 β 1,2 =, =, e 1,1 = α i,2 e 1,2 =, ICCHA 27 Shanghai Johan Suykens 2
22 Proof - conditions for optimality 8 >< >: z = γz + (I P D ) T (I P D )z + β 1,1 c 1,1 + β 1,2 c 1,2 + P N i=2 α i,1c i,1 + P N i=2 α i,2c i,2 = w = νw 1 P N 1 i=2 α i,1ϕ 1 (x i ) = w = νw 2 P N 2 i=2 α i,2ϕ 2 (x i ) = b = P N 1 i=2 α i,1 = 1 T N 1 α 1 = b = P N 2 i=2 α i,2 = 1 T N 1 α 2 = e = ηe 1,1 β 1,1 = 1,1 e 1,2 = ηe 1,2 β 1,2 = e i,1 = ηe i,1 α i,1 =, i = 2,..., N e i,2 = ηe i,2 α i,2 =, i = 2,..., N β 1,1 = c T 1,1 z q 1 e 1,1 = β 1,2 = c T 1,2 z q 2 e 1,2 = α i,1 = c T i,1 z w 1 T ϕ 1 (x i ) b 1 e i,1 =, i = 2,..., N α i,2 = c T i,2 z w 2 T ϕ 2 (x i ) b 2 e i,2 =, i = 2,..., N. ICCHA 27 Shanghai Johan Suykens 21
23 Eliminate w 1,w 2, e i,1, e i,2 Proof - elimination step Express in term of kernel functions Express set of equations in terms of z, b 1, b 2, α 1,α 2 One obtains γz+(i P D ) T (I P D )z+v 1 α 1 +V 2 α 2 +ηc 1,1 c T 1,1z+ηc 1,2 c T 1,2z = η(q 1 c 1,1 +q 2 c 1,2 ) and V T V T 1 z 1 ν Ω 1α 1 1 η α 1 b 1 1 N 1 = 2 z 1 ν Ω 2α 2 1 η α 2 b 2 1 N 1 = β 1,1 = η(c T 1,1z q 1 ) β 1,2 = η(c T 1,2z q 2 ). The dual model representation follows from the conditions for optimality. ICCHA 27 Shanghai Johan Suykens 22
24 Model selection by validation Model selection criterion: ( min Θ i,j ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Tuning parameters Θ: Kernels tuning parameters in s ij, K 1, K 2,(K 3 ) Regularization constants ν, η (take γ = ) Choice of the diagonal matrix D Choice of reference point q, e.g. q {[+1;+1], [+1; 1],[ 1; +1],[ 1, 1]} Stable results, finding a good range is satisfactory. ICCHA 27 Shanghai Johan Suykens 23
25 KMref: spiral example 2 x x 3 z x x z 1 training data (blue *), validation data (magenta o), test data (red +) Model selection: min i,j ( ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 ICCHA 27 Shanghai Johan Suykens 24
26 KMref: swiss roll example 3 x x z x x z 1 x 1 3 Given 3D swiss roll data KMref result - 2D projection 6 training data, 1 validation data ICCHA 27 Shanghai Johan Suykens 25
27 KMref: visualizing gene distributions x x z z x z z 1 x x 1 3 z 1 z x 1 3 KMref 3D projection (Alon colon cancer microarray data set) Dimension input space: 62 Number of genes: 15 (training: 5, validation: 5, test: 5) Model selection: σ 2 = 1 4, σ 2 1 = 1 3, σ 2 2 =.5σ 2 1, σ 2 3 =.1σ 2 1, η = 1, ν = 1, D = diag{1,5, 1}, q = [+1; 1; 1]. ICCHA 27 Shanghai Johan Suykens 26
28 KMref: Santa Fe laser data 3 x z discrete time k x 1 3 z 1 2 z x 1 3 original time-series {y t } t=t t=1 3D projection construct y t t m = [y t ; y t 1 ; y t 2 ;...;y t m ] with m = 9 given data {y t t m } t=m+n tot t=m+1 in a p = 1 dimensional space 2 validation data (first part), 7 training data points ICCHA 27 Shanghai Johan Suykens 27
29 Conclusions Trend: Kernelizing classical methods (FDA, PCA, CCA, ICA,...) Kernel methods: complementary views (LS-)SVM, RKHS, GP Least squares support vector machines as core problems in supervised and unsupervised learning, and beyond LS-SVM provides methodology for optimization modelling Kernel maps with reference point: LS-SVM core part Computational complexity: similar to regression/classification Reference point: converts eigenvalue problem into linear system Read more: Matlab demo file: ICCHA 27 Shanghai Johan Suykens 28
Kernel methods for exploratory data analysis and community detection
Kernel methods for exploratory data analysis and community detection Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium Email: [email protected] http://www.esat.kuleuven.be/scd/
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Kernel methods for complex networks and big data
Kernel methods for complex networks and big data Johan Suykens KU Leuven, ESAT-STADIUS Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: [email protected] http://www.esat.kuleuven.be/stadius/
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
Support Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Introduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
A Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
So which is the best?
Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
1 Spectral Methods for Dimensionality
1 Spectral Methods for Dimensionality Reduction Lawrence K. Saul Kilian Q. Weinberger Fei Sha Jihun Ham Daniel D. Lee How can we search for low dimensional structure in high dimensional data? If the data
Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
Machine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
Unsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
Early defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem
Appendix A Rayleigh Ratios and the Courant-Fischer Theorem The most important property of symmetric matrices is that they have real eigenvalues and that they can be diagonalized with respect to an orthogonal
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
Online learning of multi-class Support Vector Machines
IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
ADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard ([email protected]) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU
1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
GURLS: A Least Squares Library for Supervised Learning
Journal of Machine Learning Research 14 (2013) 3201-3205 Submitted 1/12; Revised 2/13; Published 10/13 GURLS: A Least Squares Library for Supervised Learning Andrea Tacchetti Pavan K. Mallapragada Center
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Sketch As a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
THE SVM APPROACH FOR BOX JENKINS MODELS
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Lecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Quantifying Seasonal Variation in Cloud Cover with Predictive Models
Quantifying Seasonal Variation in Cloud Cover with Predictive Models Ashok N. Srivastava, Ph.D. [email protected] Deputy Area Lead, Discovery and Systems Health Group Leader, Intelligent Data Understanding
KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
Data Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization.
echnische Universiteit Eindhoven Faculteit Elektrotechniek NIE-LINEAIRE SYSEMEN / NEURALE NEWERKEN (P6) gehouden op donderdag maart 7, van 9: tot : uur. Dit examenonderdeel bestaat uit 8 opgaven. /SOLUIONS/
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected]
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected] What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill
QUALITY ENGINEERING PROGRAM
QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and
Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
