Kernel methods for exploratory data analysis and community detection

Size: px
Start display at page:

Download "Kernel methods for exploratory data analysis and community detection"

Transcription

1 Kernel methods for exploratory data analysis and community detection Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium VUB Leerstoel Oct Kernel methods for exploratory data analysis and community detection

2 Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection

3 Principal component analysis () Given a data cloud, potentially in a high dimensional input space assume an ellipsoidal data cloud search for direction(s) in the data of maximal variance Kernel methods for exploratory data analysis and community detection

4 Principal component analysis (2) Given data {x i } N i= with x i R n (assumed zero mean) Find projected variables w T x i with maximal variance max w E{(wT x) 2 } = w T E{xx T }w = w T C w with covariance matrix C = E{xx T } and E{ } the expected value. For N given data points one has C N N x i x T i. i= Problem: the optimal solution for w in the above problem is unbounded. Therefore an additional constraint should be taken: a common choice is to impose w T w =. Kernel methods for exploratory data analysis and community detection 2

5 Principal component analysis (2) Given data {x i } N i= with x i R n (assumed zero mean) Find projected variables w T x i with maximal variance max w E{(wT x) 2 } = w T E{xx T }w = w T C w with covariance matrix C = E{xx T } and E{ } the expected value. For N given data points one has C N N x i x T i. i= Problem: the optimal solution for w in the above problem is unbounded. Therefore an additional constraint should be taken: a common choice is to impose w T w =. Kernel methods for exploratory data analysis and community detection 2

6 Principal component analysis (3) The problem formulation becomes then: max w wt C w subject to w T w = Constrained optimization problem solved by taking the Lagrangian L: L(w;λ) = 2 wt Cw λ(w T w ) with Lagrange multiplier λ. Solution is given by the eigenvalue problem Cw = λw with C = C T, obtained from setting L/ w =, L/ λ =. Kernel methods for exploratory data analysis and community detection 3

7 Principal component analysis (4) x2 u2 u λ /2 2 λ /2 µ Illustration of an eigenvalue decomposition Cu = λu - the eigenvalues λ i are real and positive - the eigenvectors u and u 2 are orthogonal with respect to each other - maximal variance solution is the direction u corresponding to λ max = λ. - note that µ = (the data should be made zero-mean beforehand) x Kernel methods for exploratory data analysis and community detection 4

8 Principal component analysis: dimensionality reduction () Aim: Decreasing the dimensionality of the given input space by mapping vectors x R n to z R m with m < n. A point x is mapped to z in the lower dimensional space by z (j) = u T j x where u j are the eigenvectors corresponding to the m largest eigenvalues and z = [z () z (2)...z (m) ] T. The error resulting from the dimensionality reduction is characterized by the neglected eigenvalues, i.e. n i=m+ λ i. Kernel methods for exploratory data analysis and community detection 5

9 Principal component analysis: dimensionality reduction (2) λ i In this example reducing the original 6-dimensional space to a 2-dimensional space is a good choice because the largest two eigenvalues λ and λ 2 are much larger than the other ones. i Kernel methods for exploratory data analysis and community detection 6

10 Principal component analysis: reconstruction problem () Consider x R n and z R m with m n (dimensionality reduction). Encoder mapping: Decoder mapping: z = G(x) x = F(z) Objective: squared distortion error (reconstruction error) min E = N N i= x i x i 2 2 = N N i= x i F(G(x i )) 2 2 Taking the mappings F,G linear corresponds to linear PCA analysis Kernel methods for exploratory data analysis and community detection 7

11 Principal component analysis: reconstruction problem (2) x G(x) z F(z) x Information bottleneck Kernel methods for exploratory data analysis and community detection 8

12 Principal component analysis: denoising example Images: 6 5 pixels (n = 24) Training on N = 2 clean digits (not containing digit 9) Test data: (bottom-left) denoised digit 9 after reconstruction using principal components; (bottom-right) 8 principal components Kernel methods for exploratory data analysis and community detection 9

13 Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 9

14 Kernel principal component analysis x (2) x () linear PCA kernel PCA (RBF kernel) Kernel PCA [Schölkopf et al., 998]: by eigenvalue decomposition of K(x,x )... K(x, x N ).. K(x N,x )... K(x N,x N ) Kernel methods for exploratory data analysis and community detection

15 Kernel PCA: primal and dual problem Underlying primal problem [Suykens et al., IEEE-TNN 23] Primal problem: min w,b,e 2 wt w ± N 2 γ i= e 2 i s.t. e i = w T ϕ(x i ) + b, i =,...,N. (Lagrange) dual problem = kernel PCA : Ω c α = λα with λ = /γ with Ω c,ij = (ϕ(x i ) ˆµ ϕ ) T (ϕ(x j ) ˆµ ϕ ) the centered kernel matrix. Interpretation:. pool of candidates components (objective function equals zero) 2. select relevant components (components with high variance) Kernel methods for exploratory data analysis and community detection

16 Kernel PCA: model representations Primal and dual model representations: M ր ց (P) : (D) : ê = w T ϕ(x ) + b ê = i α ik(x, x i ) + b which can be evaluated at any point x R d, where K(x,x i ) = ϕ(x ) T ϕ(x i ) with K(, ) a positive definite kernel and feature map ϕ( ) : R d R n h. Kernel methods for exploratory data analysis and community detection 2

17 Generalizations to Kernel PCA: other loss functions Consider general loss function L: min w,b,e 2 wt w + N 2 γ L(e i ) s.t. e i = w T ϕ(x i ) + b, i =,...,N. i= Generalizations of KPCA that lead to robustness and sparseness, e.g. Vapnik ǫ-insensitive loss, Huber loss function [Alzate & Suykens, 26]. Weighted least squares versions and incorporation of constraints: e i = w min w,b,e 2 wt w + N 2 γ T ϕ(x i ) + b, i =,...,N N v i e 2 i s.t. i= e ie () i =... i= N i= e ie (l ) i = Find l-th PC w.r.t. l orthogonality constraints (previous PC e (j) i ). The solution is given by a generalized eigenvalue problem. Kernel methods for exploratory data analysis and community detection 3

18 Robustness: Kernel Component Analysis original image corrupted image KPCA reconstruction Weighted LS-SVM: robustness [Alzate & Suykens, IEEE-TNN 28] Kernel methods for exploratory data analysis and community detection 4

19 Robustness: Kernel Component Analysis original image corrupted image KPCA reconstruction KCA reconstruction Weighted LS-SVM: robustness and sparsity [Alzate & Suykens, IEEE-TNN 28] Kernel methods for exploratory data analysis and community detection 4

20 Generalizations to Kernel PCA: sparseness x x x 2 x 2 x x x x PC PC2 PC3 Sparse kernel PCA using ǫ-insensitive loss [Alzate & Suykens, 26] (top figure: denoising; bottom figures: different support vectors (in black) per principal component vector) Kernel methods for exploratory data analysis and community detection 5

21 Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 5

22 Spectral graph clustering Minimal cut: given the graph G = (V,E), find clusters A, A 2 min q i {,+} 2 w ij (q i q j ) 2 i,j with cluster membership indicator q i (q i = if i A, q i = if i A 2 ) and W = [w ij ] the weighted adjacency matrix cut of size (minimal cut) 6 cut of size 2 Kernel methods for exploratory data analysis and community detection 6

23 Spectral graph clustering Relaxation to Min-cut spectral clustering problem min q T q= q T L q with L = D W the unnormalized graph Laplacian, degree matrix D = diag(d,...,d N ), d i = j w ij, giving L q = λ q. Cluster member indicators: ˆq i = sign( q i θ) with threshold θ. Normalized cut L q = λd q [Fiedler, 973; Shi & Malik, 2; Ng et al. 22; Chung, 997; von Luxburg, 27] Discrete version to continuous problem (Laplace operator) [Belkin & Niyogi, 23; von Luxburg et al., 28; Smale & Zhou, 27] Kernel methods for exploratory data analysis and community detection 7

24 Spectral clustering + K-means Kernel methods for exploratory data analysis and community detection 8

25 Kernel spectral clustering: case of two clusters Underlying model (primal representation): ê = w T ϕ(x ) + b with ˆq = sign[ê ] the estimated cluster indicator at any x R d. Primal problem: training on given data {x i } N i= min w,b,e 2 wt w + γ 2 N i= v i e 2 i subject to e i = w T ϕ(x i ) + b, i =,...,N with positive weights v i (will be related to inverse degree matrix). [Alzate & Suykens, IEEE-PAMI, 2] Kernel methods for exploratory data analysis and community detection 9

26 Lagrangian: Lagrangian and conditions for optimality L(w,b, e; α) = 2 wt w + γ 2 N v i e 2 i i= N α i (e i w T ϕ(x i ) b) i= Conditions for optimality: L w = w = i α iϕ(x i ) L b = i α i = L = α i = γv i e i, i =,...,N e i L = e i = w T ϕ(x i ) + b, i =,...,N α i Eliminate w,b,e, write solution in α. Kernel methods for exploratory data analysis and community detection 2

27 Kernel-based model representation Dual problem: with V M V Ωα = λα λ = /γ M V = I N T N V N T NV : weighted centering matrix N Ω = [Ω ij ]: kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i,x j ) Dual model representation: ê = N α i K(x i, x ) + b i= with K(x i,x ) = ϕ(x i ) T ϕ(x ). Kernel methods for exploratory data analysis and community detection 2

28 Choice of weights v i Take V = D where D = diag{d,...,d N } and d i = N j= Ω ij This gives the generalized eigenvalue problem: M D Ωα = λdα with M D = I N T N D N N T N D This is a modified version of random walks spectral clustering. Note that sign[e i ] = sign[α i ] if γv i > (on training data)... but sign[e ] applies beyond training data Kernel methods for exploratory data analysis and community detection 22

29 Kernel spectral clustering: more clusters Case of k clusters: additional sets of constraints min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N e (2) = Φ Nnh w (2) + b 2 N. e (k ) = Φ Nnh w (k ) + b k N where e (l) = [e (l) ;...;e(l) N ] and Φ Nn h = [ϕ(x ) T ;...;ϕ(x N ) T ] R Nn h. Dual problem: M D Ωα (l) = λdα (l), l =,...,k. [Alzate & Suykens, IEEE-PAMI, 2] Kernel methods for exploratory data analysis and community detection 23

30 Primal and dual model representations k clusters k sets of constraints (index l =,...,k ) M ր ց (P) : sign[ê (l) ] = sign[w (l)t ϕ(x ) + b l ] (D) : sign[ê (l) ] = sign[ j α(l) j K(x,x j ) + b l ] Note: additional sets of constraints also in multi-class and vector-valued output LS- SVMs [Suykens et al., 999] Advantages: out-of-sample extensions, model selection procedures, large scale methods Kernel methods for exploratory data analysis and community detection 24

31 Out-of-sample extension and coding x (2) 2 x (2) x () x () Kernel methods for exploratory data analysis and community detection 25

32 Out-of-sample extension and coding x (2) 2 x (2) x () x () Kernel methods for exploratory data analysis and community detection 25

33 Piecewise constant eigenvectors and extension () Definition. [Meila & Shi, 2] Vector α is called piecewise constant relative to a partition (A,..., A k ) iff α i = α j x i,x j A p,p =,...,k. Proposition. [Alzate & Suykens, 2] Assume (i) a training set D = {x i } N i= and validation set Dv = {x v m} N v m= i.i.d. sampled from the same underlying distribution; (ii) a set of k clusters {A,..., A k } with k > 2; (iii) an isotropic kernel function such that K(x, z) = when x and z belong to different clusters; (iv) the eigenvectors α (l) for l =,...,k are piecewise constant. Then validation set points belonging to the same cluster are collinear in the k dimensional subspace spanned by the columns of E v R Nv(k ) where Eml v = e(l) m = N i= α(l) i K(x i, x v m) + b l. Kernel methods for exploratory data analysis and community detection 26

34 Piecewise constant eigenvectors and extension (2) Key aspect of the proof: for x A p one has e (l) = N i= α(l) i K(x i,x ) + b (l) = c p (l) i A p K(x i, x ) + N = c p (l) i A p K(x i, x ) + b (l) i/ A p α (l) i K(x i, x ) + b (l) Model selection to determine kernel parameters and k: Looking for line structures in the space (e () i,e (2) i,...,e (k ) i ), evaluated on validation data (aiming for good generalization) Choice kernel: Gaussian RBF kernel χ 2 -kernel for images Kernel methods for exploratory data analysis and community detection 27

35 Model selection (looking for lines): toy problem e (2) i,val..2.3 σ 2 =.5, BLF = e (2) i,val e () i,val σ 2 =.6, BLF = e () i,val validation set x (2) x (2) x () x () train + validation + test data Kernel methods for exploratory data analysis and community detection 28

36 Model selection (looking for lines): toy problem 2 8 σ 2 =.2, BLF = i,val e (2) 2 2 x (3) e () i,val 3 2 x (2) 2 2 x ().3 σ 2 =.3, BLF = e (2) i,val e () i,val validation set x (3) x (2) 2 2 x () train + validation + test data 2 Kernel methods for exploratory data analysis and community detection 29

37 Example: image segmentation (looking for lines) i,val e (3) e (2) i,val e () i,val 2 Kernel methods for exploratory data analysis and community detection 3

38 Image ID Image Proposed method Nyström method Human Kernel methods for exploratory data analysis and community detection 3

39 Example: power grid - identifying customer profiles () Power load: 245 substations, hourly data (5 years), d = Periodic AR modelling: dimensionality reduction k-means clustering applied after dimensionality reduction.9 normalized load normalized load normalized load normalized load hour hour hour hour normalized load normalized load normalized load normalized load hour hour hour hour Kernel methods for exploratory data analysis and community detection 32

40 Example: power grid - identifying customer profiles (2) Application of kernel spectral clustering, directly on d = Model selection on kernel parameter and number of clusters [Alzate, Espinoza, De Moor, Suykens, 29] normalized load normalized load normalized load normalized load hour hour hour hour normalized load normalized load normalized load hour hour hour Kernel methods for exploratory data analysis and community detection 33

41 Example: power grid - identifying customer profiles (3) normalized load normalized load normalized load hour hour hour Electricity load: 245 substations in Belgian grid (/2 train, /2 validation) x i R : spectral clustering on high dimensional data (5 years) 3 of 7 detected clusters: - : Residential profile: morning and evening peaks - 2: Business profile: peaked around noon - 3: Industrial profile: increasing morning, oscillating afternoon and evening Kernel methods for exploratory data analysis and community detection 34

42 Kernel spectral clustering: sparse kernel models original image binary clustering Incomplete Cholesky decomposition: Ω GG T 2 η with G R NR and R N Image (Berkeley image dataset): (54, 4 pixels), 75 SV e (l) = i S SV α (l) i K(x i, x ) + b l Kernel methods for exploratory data analysis and community detection 35

43 Kernel spectral clustering: sparse kernel models original image sparse kernel model Incomplete Cholesky decomposition: Ω GG T 2 η with G R NR and R N Image (Berkeley image dataset): (54, 4 pixels), 75 SV e (l) = i S SV α (l) i K(x i, x ) + b l Kernel methods for exploratory data analysis and community detection 35

44 Highly sparse kernel models on images application on images: x i R 3 (r,g,b values per pixel), i =,...,N pre-processed into z i R 8 (quantization to 8 colors) χ 2 -kernel to compare two local color histograms (5 5 pixels window) N >., select subset M N based on quadratic Renyi entropy as in the fixed-size method [Suykens et al., 22] Highly sparse representations: # SV = 3 k Completion of cluster indicators based on out-of-sample extensions sign[ê (l) ] = sign[ j S SV α (l) j K(x,x j ) + b l ] applied to the full image [Alzate & Suykens, Neurocomputing 2] Kernel methods for exploratory data analysis and community detection 36

45 Highly sparse kernel models: toy example x (2) e (2) i x () e () i only 3k = 9 support vectors Kernel methods for exploratory data analysis and community detection 37

46 Highly sparse kernel models: toy example x (2) x () Kernel methods for exploratory data analysis and community detection 38

47 Highly sparse kernel models: toy example x(2) x () only 3k = 2 support vectors Kernel methods for exploratory data analysis and community detection 38

48 Highly sparse kernel models: toy example ê (3) i ê (2) i ê () i Kernel methods for exploratory data analysis and community detection 38

49 Highly sparse kernel models: image segmentation e () i e (2) i e (3) i Kernel methods for exploratory data analysis and community detection 39

50 Highly sparse kernel models: image segmentation.5.5 e (3) i e (2) i e () i.5.5 only 3k = 2 support vectors Kernel methods for exploratory data analysis and community detection 39

51 Hierarchical kernel spectral clustering Hierarchical kernel spectral clustering: - looking at different scales - use of model selection and validation data [Alzate & Suykens, Neural Networks, 22] Kernel methods for exploratory data analysis and community detection 4

52 Kernel spectral clustering: adding prior knowledge Pair of points x, x : c = must-link, c = cannot-link Primal problem [Alzate & Suykens, IJCNN 29] min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N. e (k ) = Φ Nnh w (k ) + b k N w ()T ϕ(x ) = cw ()T ϕ(x ). w (k )T ϕ(x ) = cw (k )T ϕ(x ) Dual problem: yields rank-one downdate of the kernel matrix Kernel methods for exploratory data analysis and community detection 4

53 Kernel spectral clustering: adding prior knowledge Pair of points x, x : c = must-link, c = cannot-link Primal problem [Alzate & Suykens, IJCNN 29] min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N. e (k ) = Φ Nnh w (k ) + b k N w ()T ϕ(x ) = cw ()T ϕ(x ). w (k )T ϕ(x ) = cw (k )T ϕ(x ) Dual problem: yields rank-one downdate of the kernel matrix Kernel methods for exploratory data analysis and community detection 4

54 Adding prior knowledge original image without constraints Kernel methods for exploratory data analysis and community detection 42

55 Adding prior knowledge original image with constraints Kernel methods for exploratory data analysis and community detection 42

56 Semi-supervised learning N unlabeled data, but additional labels on M N data X = {x,...,x N,x N+,...,x M } Binary classification by using a binary spectral clustering core model [Alzate & Suykens, WCCI 22]: min w,e,b 2 wt w γ 2 et D e+ρ 2 M m=n+ subject to e i = w T ϕ(x i ) + b, i =,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Other approaches in semi-supervised learning, e.g. [Belkin et al., 26] Kernel methods for exploratory data analysis and community detection 43

57 Semi-supervised learning N unlabeled data, but additional labels on M N data X = {x,...,x N,x N+,...,x M } Binary classification by using a binary spectral clustering core model [Alzate & Suykens, WCCI 22]: min w,e,b 2 wt w γ 2 et D e+ρ 2 M m=n+ subject to e i = w T ϕ(x i ) + b, i =,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Other approaches in semi-supervised learning, e.g. [Belkin et al., 26] Kernel methods for exploratory data analysis and community detection 43

58 Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 43

59 Modularity and community detection Modularity for two-group case [Newman, 26]: Q = 4m i,j (A ij d id j 2m )q iq j with A adjacency matrix, d i degree of node i, m = 2 i d i, q i = if node i belongs to group and q i = for group 2. Use of modularity within kernel spectral clustering [Langone et al., 22]: use at the level of model validation finding representative subgraph using a fixed-size method by maximizing the expansion factor [Maiya, 2] N(G) G with a subgraph G and its neighborhood N(G). definition data in unweighted networks: x i = A(:, i); use of a community kernel function [Kang, 29]. Kernel methods for exploratory data analysis and community detection 44

60 Protein interaction network () Pajek Yeast interaction network: 24 nodes, 448 edges [Barabasi et al., 2] Kernel methods for exploratory data analysis and community detection 45

61 Protein interaction network (2) - Yeast interaction network: 24 nodes, 448 edges [Barabasi et al., 2] - KSC community detection, representative subgraph [Langone et al., 22] 7 detected clusters.45.4 modularity number of clusters Kernel methods for exploratory data analysis and community detection 46

62 Power grid network () Pajek Western USA power grid: 494 nodes, 6594 edges [Watts & Strogatz, 998] Kernel methods for exploratory data analysis and community detection 47

63 Power grid network (2) - Western USA power grid: 494 nodes, 6594 edges [Watts & Strogatz, 998] - KSC community detection, representative subgraph [Langone et al., 22] 6 detected clusters.55.5 modularity number of clusters Kernel methods for exploratory data analysis and community detection 48

64 Evolving networks Binary clustering case: adding a memory effect [Langone et al, 22] min w,e,b 2 wt w γ 2 et D e ν w T w old subject to e i = w T ϕ(x i ) + b, i =,...,N with w old the previous result in time. Aims at including temporal smoothness Smoothed modularity criterion Kernel methods for exploratory data analysis and community detection 49

65 Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 49

66 Dimensionality reduction and data visualization Traditionally: commonly used techniques are e.g. principal component analysis (PCA), multi-dimensional scaling (MDS), self-organizing maps (SOM) More recently: isomap, locally linear embedding (LLE), Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps ( kernel eigenmap methods and manifold learning ) [Roweis & Saul, 2; Coifman et al., 25; Belkin et al., 26] Kernel maps with reference point [Suykens, IEEE-TNN 28]: data visualization and dimensionality reduction by solving linear system Kernel methods for exploratory data analysis and community detection 5

67 Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5

68 Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5

69 Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5

70 Model selection by validation Model selection criterion: ( min Θ i,j ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Tuning parameters Θ: Kernels tuning parameters in s ij, K, K 2,(K 3 ) Regularization constants ν, η Choice of the diagonal matrix D Choice of reference point q, e.g. q {[+;+], [+; ],[ ; +],[, ]} Stable results, finding a good range is satisfactory. Kernel methods for exploratory data analysis and community detection 52

71 Kernel maps: spiral example.5 x 3 q = [+; ] q = [ ; ] 2 x 3 2 x x x z 2 hat z 2 hat z hat z hat training data (blue ), validation data (magenta o), test data (red +) Model selection: min i,j ( ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Kernel methods for exploratory data analysis and community detection 53

72 Kernel maps: swiss roll example 3 x x z x x z x 3 Given 3D swiss roll data Kernel map result - 2D Matlab demo: Kernel methods for exploratory data analysis and community detection 54

73 Kernel maps: visualizing gene distribution x z x 3 z z x 3 Alon colon cancer microarray data set: 3D projections Dimension input space: 62 Number of genes: 5 (training: 5, validation: 5, test: 5) Kernel methods for exploratory data analysis and community detection 55

74 Kernel maps: time-series data visualization Santa Fe laser data 3 x z 3 hat discrete time k 2.3 x z 2 hat z hat x 3 Data {y k k 9 } N k= : train, validation, test Tuning parameters (kernel & regularization) based on validation set Model is able to make out of sample extensions Kernel methods for exploratory data analysis and community detection 56

75 Conclusions From PCA to KPCA LS-SVM model framework with primal-dual setting Out-of-sample extensions, model selection procedures and large scale methods From spectral clustering to kernel spectral clustering Applications in complex networks Data visualization problems: learning and generalization Reference point to convert eigenvalue problem into linear system Kernel methods for exploratory data analysis and community detection 57

76 Acknowledgements () Colleagues at ESAT-SCD (especially research units: systems, models, control - biomedical data processing - bioinformatics): C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, L. De Lathauwer, B. De Moor, M. Diehl, Ph. Dreesen, M. Espinoza, T. Falck, D. Geebelen, X. Huang, B. Hunyadi, A. Installe, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez, J. Luts, R. Mall, S. Mehrkanoon, M. Moonen, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M. Signoretto, P. Tsiaflakis, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, T. van Waterschoot, C. Varon, S. Yu, and others Topics of this lecture: Carlos Alzate and Rocco Langone. Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet, COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects, IWT, IBBT ehealth, COST Kernel methods for exploratory data analysis and community detection 58

77 Acknowledgements (2) Kernel methods for exploratory data analysis and community detection 59

78 Thank you Kernel methods for exploratory data analysis and community detection 6

Data visualization and dimensionality reduction using kernel maps with a reference point

Data visualization and dimensionality reduction using kernel maps with a reference point Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18

More information

Kernel methods for complex networks and big data

Kernel methods for complex networks and big data Kernel methods for complex networks and big data Johan Suykens KU Leuven, ESAT-STADIUS Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/stadius/

More information

Data Visualization and Dimensionality Reduction. using Kernel Maps with a Reference Point

Data Visualization and Dimensionality Reduction. using Kernel Maps with a Reference Point Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point Johan A.K. Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium Tel: 3/6/3 8

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

So which is the best?

So which is the best? Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Kernel Spectral Clustering for Big Data Networks

Kernel Spectral Clustering for Big Data Networks Entropy 2013, 15, 1567-1586; doi:10.3390/e15051567 Article Kernel Spectral Clustering for Big Data Networks Raghvendra Mall *, Rocco Langone and Johan A.K. Suykens OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy

More information

Large-Scale Sparsified Manifold Regularization

Large-Scale Sparsified Manifold Regularization Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

1 Spectral Methods for Dimensionality

1 Spectral Methods for Dimensionality 1 Spectral Methods for Dimensionality Reduction Lawrence K. Saul Kilian Q. Weinberger Fei Sha Jihun Ham Daniel D. Lee How can we search for low dimensional structure in high dimensional data? If the data

More information

Multidimensional data and factorial methods

Multidimensional data and factorial methods Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem

Appendix A. Rayleigh Ratios and the Courant-Fischer Theorem Appendix A Rayleigh Ratios and the Courant-Fischer Theorem The most important property of symmetric matrices is that they have real eigenvalues and that they can be diagonalized with respect to an orthogonal

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

More information

Soft Clustering with Projections: PCA, ICA, and Laplacian

Soft Clustering with Projections: PCA, ICA, and Laplacian 1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Representative Subsets For Big Data Learning using

Representative Subsets For Big Data Learning using Representative Subsets For Big Data Learning using k-nn Graphs Raghvendra Mall, Vilen Jumutc, Rocco Langone, Johan A.K. Suykens KU Leuven, ESAT/STADIUS Kasteelpark Arenberg 10, B-3001 Leuven, Belgium {raghvendra.mall,vilen.jumutc,rocco.langone,johan.suykens}@esat.kuleuven.be

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

DISSERTATION EXPLOITING GEOMETRY, TOPOLOGY, AND OPTIMIZATION FOR KNOWLEDGE DISCOVERY IN BIG DATA. Submitted by. Lori Beth Ziegelmeier

DISSERTATION EXPLOITING GEOMETRY, TOPOLOGY, AND OPTIMIZATION FOR KNOWLEDGE DISCOVERY IN BIG DATA. Submitted by. Lori Beth Ziegelmeier DISSERTATION EXPLOITING GEOMETRY, TOPOLOGY, AND OPTIMIZATION FOR KNOWLEDGE DISCOVERY IN BIG DATA Submitted by Lori Beth Ziegelmeier Department of Mathematics In partial fulfullment of the requirements

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Efficient online learning of a non-negative sparse autoencoder

Efficient online learning of a non-negative sparse autoencoder and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

More information

Francesco Sorrentino Department of Mechanical Engineering

Francesco Sorrentino Department of Mechanical Engineering Master stability function approaches to analyze stability of the synchronous evolution for hypernetworks and of synchronized clusters for networks with symmetries Francesco Sorrentino Department of Mechanical

More information

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Study on the Comparison of Electricity Forecasting Models: Korea and China Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison

More information

Image Segmentation and Registration

Image Segmentation and Registration Image Segmentation and Registration Dr. Christine Tanner (tanner@vision.ee.ethz.ch) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Conductance, the Normalized Laplacian, and Cheeger s Inequality Spectral Graph Theory Lecture 6 Conductance, the Normalized Laplacian, and Cheeger s Inequality Daniel A. Spielman September 21, 2015 Disclaimer These notes are not necessarily an accurate representation

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis

Admin stuff. 4 Image Pyramids. Spatial Domain. Projects. Fourier domain 2/26/2008. Fourier as a change of basis Admin stuff 4 Image Pyramids Change of office hours on Wed 4 th April Mon 3 st March 9.3.3pm (right after class) Change of time/date t of last class Currently Mon 5 th May What about Thursday 8 th May?

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Lecture 11: 0-1 Quadratic Program and Lower Bounds Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Learning gradients: predictive models that infer geometry and dependence

Learning gradients: predictive models that infer geometry and dependence Journal of Machine Learning Research? (200?)?-? Submitted 6/05; Published?/0? Learning gradients: predictive models that infer geometry and dependence Qiang Wu 1,2,3 Justin Guinney 3,4 Mauro Maggioni 2,5

More information

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN Machine Learning in Computer Vision A Tutorial Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN Outline Introduction Supervised Learning Unsupervised Learning Semi-Supervised

More information

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

jorge s. marques image processing

jorge s. marques image processing image processing images images: what are they? what is shown in this image? What is this? what is an image images describe the evolution of physical variables (intensity, color, reflectance, condutivity)

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/ Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Maximum Likelihood Graph Structure Estimation with Degree Distributions

Maximum Likelihood Graph Structure Estimation with Degree Distributions Maximum Likelihood Graph Structure Estimation with Distributions Bert Huang Computer Science Department Columbia University New York, NY 17 bert@cs.columbia.edu Tony Jebara Computer Science Department

More information

Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

More information

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Manifold Learning with Variational Auto-encoder for Medical Image Analysis Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Manifold

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

An Initial Study on High-Dimensional Data Visualization Through Subspace Clustering

An Initial Study on High-Dimensional Data Visualization Through Subspace Clustering An Initial Study on High-Dimensional Data Visualization Through Subspace Clustering A. Barbosa, F. Sadlo and L. G. Nonato ICMC Universidade de São Paulo, São Carlos, Brazil IWR Heidelberg University, Heidelberg,

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Randomization Approaches for Network Revenue Management with Customer Choice Behavior

Randomization Approaches for Network Revenue Management with Customer Choice Behavior Randomization Approaches for Network Revenue Management with Customer Choice Behavior Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu March 9, 2011

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Visualizing Data using t-sne

Visualizing Data using t-sne Journal of Machine Learning Research 1 (2008) 1-48 Submitted 4/00; Published 10/00 Visualizing Data using t-sne Laurens van der Maaten MICC-IKAT Maastricht University P.O. Box 616, 6200 MD Maastricht,

More information

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU 1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,

More information

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. John L. Weatherwax May 7, 9 Introduction Here you ll find various notes and derivations

More information

Neural Network Add-in

Neural Network Add-in Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix? Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Data a systematic approach

Data a systematic approach Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

More information