Kernel methods for exploratory data analysis and community detection Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg B-3 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/scd/ VUB Leerstoel 22-23 - Oct. 24 22 Kernel methods for exploratory data analysis and community detection
Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection
Principal component analysis () Given a data cloud, potentially in a high dimensional input space assume an ellipsoidal data cloud search for direction(s) in the data of maximal variance Kernel methods for exploratory data analysis and community detection
Principal component analysis (2) Given data {x i } N i= with x i R n (assumed zero mean) Find projected variables w T x i with maximal variance max w E{(wT x) 2 } = w T E{xx T }w = w T C w with covariance matrix C = E{xx T } and E{ } the expected value. For N given data points one has C N N x i x T i. i= Problem: the optimal solution for w in the above problem is unbounded. Therefore an additional constraint should be taken: a common choice is to impose w T w =. Kernel methods for exploratory data analysis and community detection 2
Principal component analysis (2) Given data {x i } N i= with x i R n (assumed zero mean) Find projected variables w T x i with maximal variance max w E{(wT x) 2 } = w T E{xx T }w = w T C w with covariance matrix C = E{xx T } and E{ } the expected value. For N given data points one has C N N x i x T i. i= Problem: the optimal solution for w in the above problem is unbounded. Therefore an additional constraint should be taken: a common choice is to impose w T w =. Kernel methods for exploratory data analysis and community detection 2
Principal component analysis (3) The problem formulation becomes then: max w wt C w subject to w T w = Constrained optimization problem solved by taking the Lagrangian L: L(w;λ) = 2 wt Cw λ(w T w ) with Lagrange multiplier λ. Solution is given by the eigenvalue problem Cw = λw with C = C T, obtained from setting L/ w =, L/ λ =. Kernel methods for exploratory data analysis and community detection 3
Principal component analysis (4) x2 u2 u λ /2 2 λ /2 µ Illustration of an eigenvalue decomposition Cu = λu - the eigenvalues λ i are real and positive - the eigenvectors u and u 2 are orthogonal with respect to each other - maximal variance solution is the direction u corresponding to λ max = λ. - note that µ = (the data should be made zero-mean beforehand) x Kernel methods for exploratory data analysis and community detection 4
Principal component analysis: dimensionality reduction () Aim: Decreasing the dimensionality of the given input space by mapping vectors x R n to z R m with m < n. A point x is mapped to z in the lower dimensional space by z (j) = u T j x where u j are the eigenvectors corresponding to the m largest eigenvalues and z = [z () z (2)...z (m) ] T. The error resulting from the dimensionality reduction is characterized by the neglected eigenvalues, i.e. n i=m+ λ i. Kernel methods for exploratory data analysis and community detection 5
Principal component analysis: dimensionality reduction (2) λ i In this example reducing the original 6-dimensional space to a 2-dimensional space is a good choice because the largest two eigenvalues λ and λ 2 are much larger than the other ones. i Kernel methods for exploratory data analysis and community detection 6
Principal component analysis: reconstruction problem () Consider x R n and z R m with m n (dimensionality reduction). Encoder mapping: Decoder mapping: z = G(x) x = F(z) Objective: squared distortion error (reconstruction error) min E = N N i= x i x i 2 2 = N N i= x i F(G(x i )) 2 2 Taking the mappings F,G linear corresponds to linear PCA analysis Kernel methods for exploratory data analysis and community detection 7
Principal component analysis: reconstruction problem (2) x G(x) z F(z) x Information bottleneck Kernel methods for exploratory data analysis and community detection 8
Principal component analysis: denoising example Images: 6 5 pixels (n = 24) Training on N = 2 clean digits (not containing digit 9) Test data: (bottom-left) denoised digit 9 after reconstruction using principal components; (bottom-right) 8 principal components Kernel methods for exploratory data analysis and community detection 9
Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 9
Kernel principal component analysis.5.5.5.5 x (2).5.5.5 2.5 2.5.5.5.5 x () linear PCA 2.5.5.5 kernel PCA (RBF kernel) Kernel PCA [Schölkopf et al., 998]: by eigenvalue decomposition of K(x,x )... K(x, x N ).. K(x N,x )... K(x N,x N ) Kernel methods for exploratory data analysis and community detection
Kernel PCA: primal and dual problem Underlying primal problem [Suykens et al., IEEE-TNN 23] Primal problem: min w,b,e 2 wt w ± N 2 γ i= e 2 i s.t. e i = w T ϕ(x i ) + b, i =,...,N. (Lagrange) dual problem = kernel PCA : Ω c α = λα with λ = /γ with Ω c,ij = (ϕ(x i ) ˆµ ϕ ) T (ϕ(x j ) ˆµ ϕ ) the centered kernel matrix. Interpretation:. pool of candidates components (objective function equals zero) 2. select relevant components (components with high variance) Kernel methods for exploratory data analysis and community detection
Kernel PCA: model representations Primal and dual model representations: M ր ց (P) : (D) : ê = w T ϕ(x ) + b ê = i α ik(x, x i ) + b which can be evaluated at any point x R d, where K(x,x i ) = ϕ(x ) T ϕ(x i ) with K(, ) a positive definite kernel and feature map ϕ( ) : R d R n h. Kernel methods for exploratory data analysis and community detection 2
Generalizations to Kernel PCA: other loss functions Consider general loss function L: min w,b,e 2 wt w + N 2 γ L(e i ) s.t. e i = w T ϕ(x i ) + b, i =,...,N. i= Generalizations of KPCA that lead to robustness and sparseness, e.g. Vapnik ǫ-insensitive loss, Huber loss function [Alzate & Suykens, 26]. Weighted least squares versions and incorporation of constraints: e i = w min w,b,e 2 wt w + N 2 γ T ϕ(x i ) + b, i =,...,N N v i e 2 i s.t. i= e ie () i =... i= N i= e ie (l ) i = Find l-th PC w.r.t. l orthogonality constraints (previous PC e (j) i ). The solution is given by a generalized eigenvalue problem. Kernel methods for exploratory data analysis and community detection 3
Robustness: Kernel Component Analysis original image corrupted image KPCA reconstruction Weighted LS-SVM: robustness [Alzate & Suykens, IEEE-TNN 28] Kernel methods for exploratory data analysis and community detection 4
Robustness: Kernel Component Analysis original image corrupted image KPCA reconstruction KCA reconstruction Weighted LS-SVM: robustness and sparsity [Alzate & Suykens, IEEE-TNN 28] Kernel methods for exploratory data analysis and community detection 4
Generalizations to Kernel PCA: sparseness 2.5 2.5 x 2.5.5.5.5.5 2 2.5 x 2.5 2.5 2.5 2 2 2.5.5.5 x 2 x 2 x 2.5.5.5.5.5.5.5.5.5 2 2.5 x.5.5.5 2 2.5 x.5.5.5 2 2.5 x PC PC2 PC3 Sparse kernel PCA using ǫ-insensitive loss [Alzate & Suykens, 26] (top figure: denoising; bottom figures: different support vectors (in black) per principal component vector) Kernel methods for exploratory data analysis and community detection 5
Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 5
Spectral graph clustering Minimal cut: given the graph G = (V,E), find clusters A, A 2 min q i {,+} 2 w ij (q i q j ) 2 i,j with cluster membership indicator q i (q i = if i A, q i = if i A 2 ) and W = [w ij ] the weighted adjacency matrix. 4 5 2 3 cut of size (minimal cut) 6 cut of size 2 Kernel methods for exploratory data analysis and community detection 6
Spectral graph clustering Relaxation to Min-cut spectral clustering problem min q T q= q T L q with L = D W the unnormalized graph Laplacian, degree matrix D = diag(d,...,d N ), d i = j w ij, giving L q = λ q. Cluster member indicators: ˆq i = sign( q i θ) with threshold θ. Normalized cut L q = λd q [Fiedler, 973; Shi & Malik, 2; Ng et al. 22; Chung, 997; von Luxburg, 27] Discrete version to continuous problem (Laplace operator) [Belkin & Niyogi, 23; von Luxburg et al., 28; Smale & Zhou, 27] Kernel methods for exploratory data analysis and community detection 7
Spectral clustering + K-means Kernel methods for exploratory data analysis and community detection 8
Kernel spectral clustering: case of two clusters Underlying model (primal representation): ê = w T ϕ(x ) + b with ˆq = sign[ê ] the estimated cluster indicator at any x R d. Primal problem: training on given data {x i } N i= min w,b,e 2 wt w + γ 2 N i= v i e 2 i subject to e i = w T ϕ(x i ) + b, i =,...,N with positive weights v i (will be related to inverse degree matrix). [Alzate & Suykens, IEEE-PAMI, 2] Kernel methods for exploratory data analysis and community detection 9
Lagrangian: Lagrangian and conditions for optimality L(w,b, e; α) = 2 wt w + γ 2 N v i e 2 i i= N α i (e i w T ϕ(x i ) b) i= Conditions for optimality: L w = w = i α iϕ(x i ) L b = i α i = L = α i = γv i e i, i =,...,N e i L = e i = w T ϕ(x i ) + b, i =,...,N α i Eliminate w,b,e, write solution in α. Kernel methods for exploratory data analysis and community detection 2
Kernel-based model representation Dual problem: with V M V Ωα = λα λ = /γ M V = I N T N V N T NV : weighted centering matrix N Ω = [Ω ij ]: kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i,x j ) Dual model representation: ê = N α i K(x i, x ) + b i= with K(x i,x ) = ϕ(x i ) T ϕ(x ). Kernel methods for exploratory data analysis and community detection 2
Choice of weights v i Take V = D where D = diag{d,...,d N } and d i = N j= Ω ij This gives the generalized eigenvalue problem: M D Ωα = λdα with M D = I N T N D N N T N D This is a modified version of random walks spectral clustering. Note that sign[e i ] = sign[α i ] if γv i > (on training data)... but sign[e ] applies beyond training data Kernel methods for exploratory data analysis and community detection 22
Kernel spectral clustering: more clusters Case of k clusters: additional sets of constraints min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N e (2) = Φ Nnh w (2) + b 2 N. e (k ) = Φ Nnh w (k ) + b k N where e (l) = [e (l) ;...;e(l) N ] and Φ Nn h = [ϕ(x ) T ;...;ϕ(x N ) T ] R Nn h. Dual problem: M D Ωα (l) = λdα (l), l =,...,k. [Alzate & Suykens, IEEE-PAMI, 2] Kernel methods for exploratory data analysis and community detection 23
Primal and dual model representations k clusters k sets of constraints (index l =,...,k ) M ր ց (P) : sign[ê (l) ] = sign[w (l)t ϕ(x ) + b l ] (D) : sign[ê (l) ] = sign[ j α(l) j K(x,x j ) + b l ] Note: additional sets of constraints also in multi-class and vector-valued output LS- SVMs [Suykens et al., 999] Advantages: out-of-sample extensions, model selection procedures, large scale methods Kernel methods for exploratory data analysis and community detection 24
Out-of-sample extension and coding 8 8 6 6 4 4 2 2 x (2) 2 x (2) 2 4 4 6 6 8 8 2 8 6 4 2 2 4 6 x () 2 8 6 4 2 2 4 6 x () Kernel methods for exploratory data analysis and community detection 25
Out-of-sample extension and coding 8 8 6 6 4 4 2 2 x (2) 2 x (2) 2 4 4 6 6 8 8 2 8 6 4 2 2 4 6 x () 2 8 6 4 2 2 4 6 x () Kernel methods for exploratory data analysis and community detection 25
Piecewise constant eigenvectors and extension () Definition. [Meila & Shi, 2] Vector α is called piecewise constant relative to a partition (A,..., A k ) iff α i = α j x i,x j A p,p =,...,k. Proposition. [Alzate & Suykens, 2] Assume (i) a training set D = {x i } N i= and validation set Dv = {x v m} N v m= i.i.d. sampled from the same underlying distribution; (ii) a set of k clusters {A,..., A k } with k > 2; (iii) an isotropic kernel function such that K(x, z) = when x and z belong to different clusters; (iv) the eigenvectors α (l) for l =,...,k are piecewise constant. Then validation set points belonging to the same cluster are collinear in the k dimensional subspace spanned by the columns of E v R Nv(k ) where Eml v = e(l) m = N i= α(l) i K(x i, x v m) + b l. Kernel methods for exploratory data analysis and community detection 26
Piecewise constant eigenvectors and extension (2) Key aspect of the proof: for x A p one has e (l) = N i= α(l) i K(x i,x ) + b (l) = c p (l) i A p K(x i, x ) + N = c p (l) i A p K(x i, x ) + b (l) i/ A p α (l) i K(x i, x ) + b (l) Model selection to determine kernel parameters and k: Looking for line structures in the space (e () i,e (2) i,...,e (k ) i ), evaluated on validation data (aiming for good generalization) Choice kernel: Gaussian RBF kernel χ 2 -kernel for images Kernel methods for exploratory data analysis and community detection 27
Model selection (looking for lines): toy problem.4.3.2. e (2) i,val..2.3 σ 2 =.5, BLF =.56.4.4.2.2.4.6.8.8.6.4.2 e (2) i,val.2.4.6 e () i,val σ 2 =.6, BLF =..8.4.2.2 e () i,val validation set x (2) x (2) 25 2 5 5 5 5 2 x () 25 3 2 2 3 25 2 5 5 5 5 2 x () 25 3 2 2 3 train + validation + test data Kernel methods for exploratory data analysis and community detection 28
Model selection (looking for lines): toy problem 2 8 σ 2 =.2, BLF =.49 6 4 3 i,val e (2) 2 2 x (3) 2 2 2 4 6 8 6 4 2 2 4 e () i,val 3 2 x (2) 2 2 x ().3 σ 2 =.3, BLF =..2. 3 e (2) i,val..2.3.4.4.3.2...2.3 e () i,val validation set x (3) 2 2 3 2 x (2) 2 2 x () train + validation + test data 2 Kernel methods for exploratory data analysis and community detection 29
Example: image segmentation (looking for lines) 4 3 2 i,val e (3) 2 3 2 e (2) i,val 2 3 5 4 3 2 e () i,val 2 Kernel methods for exploratory data analysis and community detection 3
Image ID Image Proposed method Nyström method Human 4586 4249 6762 479 9673 6296 982 396 29587 3773 Kernel methods for exploratory data analysis and community detection 3
Example: power grid - identifying customer profiles () Power load: 245 substations, hourly data (5 years), d = 43.824 Periodic AR modelling: dimensionality reduction 43.824 24 k-means clustering applied after dimensionality reduction.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.... hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.9 normalized load.8.7.6.5.4.3.2.... hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24 hour 2 4 6 8 2 4 6 8 2 22 24 Kernel methods for exploratory data analysis and community detection 32
Example: power grid - identifying customer profiles (2) Application of kernel spectral clustering, directly on d = 43.824 Model selection on kernel parameter and number of clusters [Alzate, Espinoza, De Moor, Suykens, 29] normalized load.9.8.7.6.5.4.3.2. normalized load.9.8.7.6.5.4.3.2. normalized load.9.8.7.6.5.4.3.2. normalized load.9.8.7.6.5.4.3.2. 5 5 2 hour 5 5 2 hour 5 5 2 hour 5 5 2 hour normalized load.9.8.7.6.5.4.3.2. normalized load.9.8.7.6.5.4.3.2. normalized load.9.8.7.6.5.4.3.2. 5 5 2 hour 5 5 2 hour 5 5 2 hour Kernel methods for exploratory data analysis and community detection 33
Example: power grid - identifying customer profiles (3).9.9.9 normalized load.8.7.6.5.4.3.2 normalized load.8.7.6.5.4.3.2 normalized load.8.7.6.5.4.3.2... 5 5 2 hour 5 5 2 hour 5 5 2 hour Electricity load: 245 substations in Belgian grid (/2 train, /2 validation) x i R 43.824 : spectral clustering on high dimensional data (5 years) 3 of 7 detected clusters: - : Residential profile: morning and evening peaks - 2: Business profile: peaked around noon - 3: Industrial profile: increasing morning, oscillating afternoon and evening Kernel methods for exploratory data analysis and community detection 34
Kernel spectral clustering: sparse kernel models original image binary clustering Incomplete Cholesky decomposition: Ω GG T 2 η with G R NR and R N Image (Berkeley image dataset): 32 48 (54, 4 pixels), 75 SV e (l) = i S SV α (l) i K(x i, x ) + b l Kernel methods for exploratory data analysis and community detection 35
Kernel spectral clustering: sparse kernel models original image sparse kernel model Incomplete Cholesky decomposition: Ω GG T 2 η with G R NR and R N Image (Berkeley image dataset): 32 48 (54, 4 pixels), 75 SV e (l) = i S SV α (l) i K(x i, x ) + b l Kernel methods for exploratory data analysis and community detection 35
Highly sparse kernel models on images application on images: x i R 3 (r,g,b values per pixel), i =,...,N pre-processed into z i R 8 (quantization to 8 colors) χ 2 -kernel to compare two local color histograms (5 5 pixels window) N >., select subset M N based on quadratic Renyi entropy as in the fixed-size method [Suykens et al., 22] Highly sparse representations: # SV = 3 k Completion of cluster indicators based on out-of-sample extensions sign[ê (l) ] = sign[ j S SV α (l) j K(x,x j ) + b l ] applied to the full image [Alzate & Suykens, Neurocomputing 2] Kernel methods for exploratory data analysis and community detection 36
Highly sparse kernel models: toy example 8 4 6 3 4 2 2 x (2) e (2) i 2 2 4 3 6 5 5 x () 4 3 2 2 3 e () i only 3k = 9 support vectors Kernel methods for exploratory data analysis and community detection 37
Highly sparse kernel models: toy example 2 4 3 2 x (2) 2 3 4 4 3 2 2 3 4 x () Kernel methods for exploratory data analysis and community detection 38
Highly sparse kernel models: toy example 2 5 4 3 2 x(2) 2 3 4 5 5 4 3 2 2 3 4 5 x () only 3k = 2 support vectors Kernel methods for exploratory data analysis and community detection 38
Highly sparse kernel models: toy example 2 4 2 ê (3) i 2 4 4 2 ê (2) i 2 4 4 2 ê () i 2 4 6 Kernel methods for exploratory data analysis and community detection 38
Highly sparse kernel models: image segmentation.5.5.5 3 2 2 3 2.5 2.5.5.5 e () i e (2) i e (3) i Kernel methods for exploratory data analysis and community detection 39
Highly sparse kernel models: image segmentation.5.5 e (3) i.5 2 2.5 3 2 e (2) i 2 3.5 e () i.5.5 only 3k = 2 support vectors Kernel methods for exploratory data analysis and community detection 39
Hierarchical kernel spectral clustering Hierarchical kernel spectral clustering: - looking at different scales - use of model selection and validation data [Alzate & Suykens, Neural Networks, 22] Kernel methods for exploratory data analysis and community detection 4
Kernel spectral clustering: adding prior knowledge Pair of points x, x : c = must-link, c = cannot-link Primal problem [Alzate & Suykens, IJCNN 29] min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N. e (k ) = Φ Nnh w (k ) + b k N w ()T ϕ(x ) = cw ()T ϕ(x ). w (k )T ϕ(x ) = cw (k )T ϕ(x ) Dual problem: yields rank-one downdate of the kernel matrix Kernel methods for exploratory data analysis and community detection 4
Kernel spectral clustering: adding prior knowledge Pair of points x, x : c = must-link, c = cannot-link Primal problem [Alzate & Suykens, IJCNN 29] min w (l),e (l),b l 2 k l= w (l)t w (l) + 2 k l= γ l e (l)t D e (l) subject to e () = Φ Nnh w () + b N. e (k ) = Φ Nnh w (k ) + b k N w ()T ϕ(x ) = cw ()T ϕ(x ). w (k )T ϕ(x ) = cw (k )T ϕ(x ) Dual problem: yields rank-one downdate of the kernel matrix Kernel methods for exploratory data analysis and community detection 4
Adding prior knowledge original image without constraints Kernel methods for exploratory data analysis and community detection 42
Adding prior knowledge original image with constraints Kernel methods for exploratory data analysis and community detection 42
Semi-supervised learning N unlabeled data, but additional labels on M N data X = {x,...,x N,x N+,...,x M } Binary classification by using a binary spectral clustering core model [Alzate & Suykens, WCCI 22]: min w,e,b 2 wt w γ 2 et D e+ρ 2 M m=n+ subject to e i = w T ϕ(x i ) + b, i =,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Other approaches in semi-supervised learning, e.g. [Belkin et al., 26] Kernel methods for exploratory data analysis and community detection 43
Semi-supervised learning N unlabeled data, but additional labels on M N data X = {x,...,x N,x N+,...,x M } Binary classification by using a binary spectral clustering core model [Alzate & Suykens, WCCI 22]: min w,e,b 2 wt w γ 2 et D e+ρ 2 M m=n+ subject to e i = w T ϕ(x i ) + b, i =,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Other approaches in semi-supervised learning, e.g. [Belkin et al., 26] Kernel methods for exploratory data analysis and community detection 43
Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 43
Modularity and community detection Modularity for two-group case [Newman, 26]: Q = 4m i,j (A ij d id j 2m )q iq j with A adjacency matrix, d i degree of node i, m = 2 i d i, q i = if node i belongs to group and q i = for group 2. Use of modularity within kernel spectral clustering [Langone et al., 22]: use at the level of model validation finding representative subgraph using a fixed-size method by maximizing the expansion factor [Maiya, 2] N(G) G with a subgraph G and its neighborhood N(G). definition data in unweighted networks: x i = A(:, i); use of a community kernel function [Kang, 29]. Kernel methods for exploratory data analysis and community detection 44
Protein interaction network () Pajek Yeast interaction network: 24 nodes, 448 edges [Barabasi et al., 2] Kernel methods for exploratory data analysis and community detection 45
Protein interaction network (2) - Yeast interaction network: 24 nodes, 448 edges [Barabasi et al., 2] - KSC community detection, representative subgraph [Langone et al., 22] 7 detected clusters.45.4 modularity.35.3.25.2.5. 5 5 2 25 3 35 4 45 5 55 6 65 7 75 8 85 9 95 number of clusters Kernel methods for exploratory data analysis and community detection 46
Power grid network () Pajek Western USA power grid: 494 nodes, 6594 edges [Watts & Strogatz, 998] Kernel methods for exploratory data analysis and community detection 47
Power grid network (2) - Western USA power grid: 494 nodes, 6594 edges [Watts & Strogatz, 998] - KSC community detection, representative subgraph [Langone et al., 22] 6 detected clusters.55.5 modularity.45.4.35 2 4 6 8 2 4 6 8 2 22 24 number of clusters Kernel methods for exploratory data analysis and community detection 48
Evolving networks Binary clustering case: adding a memory effect [Langone et al, 22] min w,e,b 2 wt w γ 2 et D e ν w T w old subject to e i = w T ϕ(x i ) + b, i =,...,N with w old the previous result in time. Aims at including temporal smoothness Smoothed modularity criterion Kernel methods for exploratory data analysis and community detection 49
Overview Principal component analysis Kernel principal component analysis and LS-SVM; sparse and robust extensions Kernel spectral clustering Community detection in complex networks Data visualization using kernel maps with reference point Kernel methods for exploratory data analysis and community detection 49
Dimensionality reduction and data visualization Traditionally: commonly used techniques are e.g. principal component analysis (PCA), multi-dimensional scaling (MDS), self-organizing maps (SOM) More recently: isomap, locally linear embedding (LLE), Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps ( kernel eigenmap methods and manifold learning ) [Roweis & Saul, 2; Coifman et al., 25; Belkin et al., 26] Kernel maps with reference point [Suykens, IEEE-TNN 28]: data visualization and dimensionality reduction by solving linear system Kernel methods for exploratory data analysis and community detection 5
Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5
Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5
Kernel maps with reference point: formulation Kernel maps with reference point [Suykens, IEEE-TNN 28]: - LS-SVM core part: realize dimensionality reduction x z - Regularization term: (z P D z) T (z P D z) = P N i= z i P N j= s ijdz j 2 2 with D diagonal matrix and s ij = exp( x i x j 2 2/σ 2 ) - reference point q (e.g. first point; sacrificed in the visualization) Example: d = 2 min z,w,w 2,b,b 2,e i,,e i,2 2 (z P Dz) T (z P D z)+ ν 2 (wt w + w T 2 w 2) + η 2 such that c T, z = q + e, c T,2 z = q 2 + e,2 c T i, z = wt ϕ (x i ) + b + e i,, i = 2,..., N c T i,2 z = wt 2 ϕ 2(x i ) + b 2 + e i,2, i = 2,..., N NX (e 2 i, + e2 i,2 ) i= Coordinates in low dimensional space: z = [z ;z 2 ;...;z N ] R dn Kernel methods for exploratory data analysis and community detection 5
Model selection by validation Model selection criterion: ( min Θ i,j ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Tuning parameters Θ: Kernels tuning parameters in s ij, K, K 2,(K 3 ) Regularization constants ν, η Choice of the diagonal matrix D Choice of reference point q, e.g. q {[+;+], [+; ],[ ; +],[, ]} Stable results, finding a good range is satisfactory. Kernel methods for exploratory data analysis and community detection 52
Kernel maps: spiral example.5 x 3 q = [+; ] q = [ ; ] 2 x 3 2 x 3.5.5.5 x 2.5.5.5 x.5 8 6 8 6 z 2 hat z 2 hat 4 4 2 2 2.2.5..5.5. z hat 2..5.5..5.2 z hat training data (blue ), validation data (magenta o), test data (red +) Model selection: min i,j ( ẑ T i ẑj ẑ i 2 ẑ j 2 ) 2 xt i x j x i 2 x j 2 Kernel methods for exploratory data analysis and community detection 53
Kernel maps: swiss roll example 3 x 3.6 2.5.4 x 3.2.2.4.6 z 2 2.5.8.6.4.2.2 x 2.4.6.8.5 x.5.5 3.5 3 2.5 2.5.5 z x 3 Given 3D swiss roll data Kernel map result - 2D Matlab demo: http://www.esat.kuleuven.be/sista/lssvmlab/kmref/demoswisskmref.m Kernel methods for exploratory data analysis and community detection 54
Kernel maps: visualizing gene distribution x 3 2. 2 z 3.9.8.7 2.35 2.3 2.25 2.2 2.5 2. 2.5 2.95.9 x 3 z z 2 2 2. 2.2 2.3 x 3 Alon colon cancer microarray data set: 3D projections Dimension input space: 62 Number of genes: 5 (training: 5, validation: 5, test: 5) Kernel methods for exploratory data analysis and community detection 55
Kernel maps: time-series data visualization Santa Fe laser data 3 x 3 25 2.9 2.8 2 2.7 5 z 3 hat 2.6 2.5 2.4 5 2 3 4 5 6 7 8 9 discrete time k 2.3 x 3.5 2 2.5 z 2 hat 3 3.5 4.5 2 z hat 2.5 3 3.5 4 x 3 Data {y k k 9 } N k= : train, validation, test Tuning parameters (kernel & regularization) based on validation set Model is able to make out of sample extensions Kernel methods for exploratory data analysis and community detection 56
Conclusions From PCA to KPCA LS-SVM model framework with primal-dual setting Out-of-sample extensions, model selection procedures and large scale methods From spectral clustering to kernel spectral clustering Applications in complex networks Data visualization problems: learning and generalization Reference point to convert eigenvalue problem into linear system Kernel methods for exploratory data analysis and community detection 57
Acknowledgements () Colleagues at ESAT-SCD (especially research units: systems, models, control - biomedical data processing - bioinformatics): C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, L. De Lathauwer, B. De Moor, M. Diehl, Ph. Dreesen, M. Espinoza, T. Falck, D. Geebelen, X. Huang, B. Hunyadi, A. Installe, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez, J. Luts, R. Mall, S. Mehrkanoon, M. Moonen, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M. Signoretto, P. Tsiaflakis, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, T. van Waterschoot, C. Varon, S. Yu, and others Topics of this lecture: Carlos Alzate and Rocco Langone. Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet, COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects, IWT, IBBT ehealth, COST Kernel methods for exploratory data analysis and community detection 58
Acknowledgements (2) Kernel methods for exploratory data analysis and community detection 59
Thank you Kernel methods for exploratory data analysis and community detection 6