Kernel methods for complex networks and big data


 Jack Kevin Mosley
 1 years ago
 Views:
Transcription
1 Kernel methods for complex networks and big data Johan Suykens KU Leuven, ESATSTADIUS Kasteelpark Arenberg 10 B3001 Leuven (Heverlee), Belgium STATLEARN 2015, Grenoble Kernel methods for complex networks and big data  Johan Suykens
2 Introduction and motivation Kernel methods for complex networks and big data  Johan Suykens
3 Data world (a) C C C 3 C 4 Kernel methods for complex networks and big data  Johan Suykens 1
4 Challenges datadriven general methodology scalability need for new mathematical frameworks Kernel methods for complex networks and big data  Johan Suykens 2
5 Sparsity big data requires sparsity Kernel methods for complex networks and big data  Johan Suykens 3
6 Different paradigms SVM & Kernel methods Convex Optimization Sparsity & Compressed sensing Kernel methods for complex networks and big data  Johan Suykens 4
7 Different paradigms SVM & Kernel methods Convex Optimization? Sparsity & Compressed sensing Kernel methods for complex networks and big data  Johan Suykens 4
8 Sparsity through regularization or loss function Kernel methods for complex networks and big data  Johan Suykens 4
9 Sparsity: through regularization or loss function through regularization: model ŷ = w T x + b min j w j + γ i e 2 i sparse w through loss function: model ŷ = i α ik(x, x i ) + b min w T w + γ i L(e i ) sparse α ε 0 +ε Kernel methods for complex networks and big data  Johan Suykens 5
10 Function estimation in RKHS Find function f such that min f H 1 N N L(y i, f(x i )) + λ f 2 K i=1 with L(, ) the loss function. f K is norm in RKHS H defined by K. Representer theorem: for convex loss function, solution of the form f(x) = NX α i K(x, x i ) i=1 Reproducing property f(x) = f, K x K with K x ( ) = K(x, ) [Wahba, 1990; Poggio & Girosi, 1990; Evgeniou et al., 2000] Kernel methods for complex networks and big data  Johan Suykens 6
11 Learning with primal and dual model representations Kernel methods for complex networks and big data  Johan Suykens 6
12 Learning models from data: alternative views  Consider model ŷ = f(x; w), given input/output data {(x i, y i )} N i=1 : min w wt w + γ N (y i f(x i ;w)) 2 i=1  Rewrite the problem as min w,e w T w + γ N i=1 (y i f(x i ;w)) 2 subject to e i = y i f(x i ;w), i = 1,...,N  Construct Lagrangian and take condition for optimality  For a model f(x;w) = h j=1 w jϕ j (x) = w T ϕ(x) one obtains then ˆf(x) = N i=1 α ik(x, x i ) with K(x, x i ) = ϕ(x) T ϕ(x i ). Kernel methods for complex networks and big data  Johan Suykens 7
13 Learning models from data: alternative views  Consider model ŷ = f(x; w), given input/output data {(x i, y i )} N i=1 : min w wt w + γ N (y i f(x i ; w)) 2 i=1  Rewrite the problem as min w,e w T w + γ N i=1 e2 i subject to e i = y i f(x i ;w),i = 1,...,N  Express the solution and the model in terms of Lagrange multipliers α i  For a model f(x;w) = h j=1 w jϕ j (x) = w T ϕ(x) one obtains then ˆf(x) = N i=1 α ik(x, x i ) with K(x, x i ) = ϕ(x) T ϕ(x i ). Kernel methods for complex networks and big data  Johan Suykens 7
14 Least Squares Support Vector Machines: core models Regression Classification min w,b,e wt w + γ i min w,b,e wt w + γ i e 2 i e 2 i s.t. y i = w T ϕ(x i ) + b + e i, i s.t. y i (w T ϕ(x i ) + b) = 1 e i, i Kernel pca (V = I), Kernel spectral clustering (V = D 1 ) min w,b,e wt w + γ i v i e 2 i s.t. e i = w T ϕ(x i ) + b, i Kernel canonical correlation analysis/partial least squares min w,v,b,d,e,r wt w + v T v + ν i (e i r i ) 2 s.t. { ei = w T ϕ (1) (x i ) + b = v T ϕ (2) (y i ) + d r i [Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010] Kernel methods for complex networks and big data  Johan Suykens 8
15 Probability and quantum mechanics Kernel pmf estimation Primal: 1 min w,p 2 w, w subject to p i = w,ϕ(x i ), i = 1,...,N and N i=1 p i = 1 i Dual: p i = P Nj=1 K(x j,x i ) P Ni=1 P Nj=1 K(x j,x i ) Quantum measurement: state vector ψ, measurement operators M i Primal: 1 min w,p 2 w w subject to p i = Re( w M i ψ ), i = 1,...,N and N i=1 p i = 1 i Dual: p i = ψ M i ψ (Born rule, orthogonal projective measurement) [Suykens, Physical Review A, 2013] Kernel methods for complex networks and big data  Johan Suykens 9
16 SVMs: living in two worlds... Primal space x x x x x o o o o Feature space ϕ(x) x x x o oo x x o Input space Parametric Dual space ŷ = sign[w T ϕ(x) + b] Nonparametric ŷ K(x i, x j ) = ϕ(x i ) T ϕ(x j ) (Mercer) ŷ = sign[ P #sv i=1 α iy i K(x, x i ) + b] ŷ w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x α #sv K(x, x #sv ) Kernel methods for complex networks and big data  Johan Suykens 10
17 SVMs: living in two worlds... Primal space x x x x x o o o o Feature space ϕ(x) x x x o oo x x o Input space Parametric Dual space ŷ = sign[w T ϕ(x) + b] Nonparametric ŷ K(x i, x j ) = ϕ(x i ) T ϕ(x j ) ( Kernel trick ) ŷ = sign[ P #sv i=1 α iy i K(x, x i ) + b] ŷ w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x Parametric Non parametric α #sv K(x, x #sv ) Kernel methods for complex networks and big data  Johan Suykens 10
18 Linear model: solving in primal or dual? inputs x R d, output y R training set {(x i, y i )} N i=1 Model ր ց (P) : ŷ = w T x + b, w R d (D) : ŷ = i α i x T i x + b, α RN Kernel methods for complex networks and big data  Johan Suykens 11
19 Linear model: solving in primal or dual? inputs x R d, output y R training set {(x i, y i )} N i=1 Model ր ց (P) : ŷ = w T x + b, w R d (D) : ŷ = i α i x T i x + b, α RN Kernel methods for complex networks and big data  Johan Suykens 11
20 Linear model: solving in primal or dual? few inputs, many data points: d N primal : w R d dual: α R N (large kernel matrix: N N) Kernel methods for complex networks and big data  Johan Suykens 12
21 Linear model: solving in primal or dual? many inputs, few data points: d N primal: w R d dual : α R N (small kernel matrix: N N) Kernel methods for complex networks and big data  Johan Suykens 12
22 From linear to nonlinear model: Feature map and kernel Model ր ց (P) : (D) : ŷ = w T ϕ(x) + b ŷ = i α i K(x i, x) + b Mercer theorem: K(x, z) = ϕ(x) T ϕ(z) Feature map ϕ(x) = [ϕ 1 (x); ϕ 2 (x);...;ϕ h (x)] Kernel function K(x, z) (e.g. linear, polynomial, RBF,...) Use of feature map and positive definite kernel [Cortes & Vapnik, 1995] Optimization in infinite dimensional spaces for LSSVM [Signoretto, De Lathauwer, Suykens, 2011] Kernel methods for complex networks and big data  Johan Suykens 13
23 Sparsity by fixedsize kernel method Kernel methods for complex networks and big data  Johan Suykens 13
24 Fixedsize method: steps 1. selection of a subset from the data 2. kernel matrix on the subset 3. eigenvalue decomposition of kernel matrix 4. approximation of the feature map based on the eigenvectors (Nyström approximation) 5. estimation of the model in the primal using the approximate feature map (applicable to large data sets) [Suykens et al., 2002] (lssvm book) Kernel methods for complex networks and big data  Johan Suykens 14
25 Selection of subset random quadratic Renyi entropy Kernel methods for complex networks and big data  Johan Suykens 15
26 Subset selection using quadratic Renyi entropy: example Kernel methods for complex networks and big data  Johan Suykens 16
27 Fixedsize method: using quadratic Renyi entropy Algorithm: 1. Given N training data 2. Choose a working set with fixed size M (i.e. M support vectors) (typically M N). 3. Randomly select a SV x from the working set of M support vectors. 4. Randomly select a point x t from the training data and replace x by x t. If the entropy increases by taking the point x t instead of x then this point x t is accepted for the working set of M SVs, otherwise the point x t is rejected (and returned to the training data pool) and the SV x stays in the working set. 5. Calculate the entropy value for the present working set. The quadratic Renyi entropy equals H R = log 1 P M 2 ij Ω (M,M) ij. [Suykens et al., 2002] Kernel methods for complex networks and big data  Johan Suykens 17
28 Nyström method big kernel matrix: Ω (N,N) R N N small kernel matrix: Ω (M,M) R M M (on subset) Eigenvalue decompositions: Ω (N,N) Ũ = Ũ Λ and Ω (M,M) U = U Λ Relation to eigenvalues and eigenfunctions of the integral equation K(x, x )φ i (x)p(x)dx = λ i φ i (x ) with ˆλ i = 1 M λ i, ˆφi (x k ) = M u ki, ˆφi (x ) = M λ i M u ki K(x k,x ) k=1 [Williams & Seeger, 2001] (Nyström method in GP) Kernel methods for complex networks and big data  Johan Suykens 18
29 Fixedsize method: estimation in primal For the feature map ϕ( ) : R d R h obtain an approximation ϕ( ) : R d R M based on the eigenvalue decomposition of the kernel matrix with ϕ i (x ) = ˆλi ˆφi (x ) (on a subset of size M N). Estimate in primal: min w, b 1 2 wt w + γ 1 2 N (y i w T ϕ(x i ) b) 2 i=1 Sparse representation is obtained: w R M with M N and M h. [Suykens et al., 2002; De Brabanter et al., CSDA 2010] Kernel methods for complex networks and big data  Johan Suykens 19
30 Fixedsize method: performance in classification pid spa mgt adu ftc N N cv N test d FSLSSVM (# SV) CSVM (# SV) νsvm (# SV) RBF FSLSSVM 76.7(3.43) 92.5(0.67) 86.6(0.51) 85.21(0.21) 81.8(0.52) Lin FSLSSVM 77.6(0.78) 90.9(0.75) 77.8(0.23) 83.9(0.17) 75.61(0.35) RBF CSVM 75.1(3.31) 92.6(0.76) 85.6(1.46) 84.81(0.20) 81.5(no cv) Lin CSVM 76.1(1.76) 91.9(0.82) 77.3(0.53) 83.5(0.28) 75.24(no cv) RBF νsvm 75.8(3.34) 88.7(0.73) 84.2(1.42) 83.9(0.23) 81.6(no cv) Maj. Rule 64.8(1.46) 60.6(0.58) 65.8(0.28) 83.4(0.1) 51.23(0.20) Fixedsize (FS) LSSVM: good performance and sparsity wrt CSVM and νsvm Challenging to achieve high performance by very sparse models [De Brabanter et al., CSDA 2010] Kernel methods for complex networks and big data  Johan Suykens 20
31 Two stages of sparsity stage 1 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
32 Two stages of sparsity stage 1 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
33 Two stages of sparsity stage 1 stage 2 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
34 Two stages of sparsity stage 1 stage 2 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
35 Big data: representative subsets using knn graphs (1) Convert the large scale dataset into a sparse undirected knn graph using a distributed network generation framework Julia language (http://julialang.org/) Large N N kernel matrix Ω for data set D with N data points Batch clusterbased approach: a batch subset D p D is loaded per node with P p=1d p = D; related matrix slice X p and Ω p. MapReduce and AllReduce settings implementable using Hadoop or Spark (see also [Agarwal et al., JMLR 2014] Terascale linear learning) Computational complexity: complexity for construction of the kernel matrix reduced from O(N 2 ) to O(N 2 (1 + log N)/P) for P nodes [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 22
36 Big data: representative subsets using knn graphs (2) Map: for Silverman s Rule of Thumb, compute mean and standard deviation of the data per node; compute slice Ω p ; sort in ascending order the columns of Ω p (sortperm in Julia); pick indices for top k values. Reduce: merge knn subgraphs into aggregated knn graph [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 23
37 Big data: representative subsets using knn graphs (3) Steps: 1. MapReduce for representative subsets using knn graphs 2. Subset selection using FURS [Mall et al., SNAM 2013] (which aims for good modelling of dense regions; deterministic method) 3. Use in fixedsize LSSVM ([Mall & Suykens, IEEETNNLS, in press]) Dataset N SR SRE FURS k=10 FURS k=100 FURS k=500 Generalization or Test Error ± Standard Deviation 4G 28, ± ± ± ± ± G 81, ± ± ± ± ±0.009 Magic 19, ± ± ± ± ±3.062 Shuttle 58, ± ± ± ± ±0.715 Skin 245, ± ± ± ± ±0.080 SR: stratified random sampling; SRE: stratified Renyi entropy FURS k=10 : FURS based on knn graphs with k = 10 [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 24
38 Kernelbased models for spectral clustering Kernel methods for complex networks and big data  Johan Suykens 24
39 Spectral graph clustering (1) Minimal cut: given the graph G = (V,E), find clusters A 1, A 2 min q i { 1,+1} 1 2 w ij (q i q j ) 2 i,j with cluster membership indicator q i (q i = 1 if i A 1, q i = 1 if i A 2 ) and W = [w ij ] the weighted adjacency matrix cut of size 1 (minimal cut) 6 cut of size 2 Kernel methods for complex networks and big data  Johan Suykens 25
40 Spectral graph clustering (2) Mincut spectral clustering problem min q T q=1 q T L q with L = D W the unnormalized graph Laplacian, degree matrix D = diag(d 1,...,d N ), d i = j w ij, giving L q = λ q. Cluster member indicators: ˆq i = sign( q i θ) with threshold θ. Normalized cut L q = λd q [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997; von Luxburg, 2007] Discrete version to continuous problem (Laplace operator) [Belkin & Niyogi, 2003; von Luxburg et al., 2008; Smale & Zhou, 2007] Kernel methods for complex networks and big data  Johan Suykens 26
41 Kernel Spectral Clustering (KSC): case of two clusters Primal problem: training on given data {x i } N i=1 1 min w,b,e 2 wt w γ 1 2 et V e subject to e i = w T ϕ(x i ) + b, i = 1,...,N with weighting matrix V and ϕ( ) : R d R h the feature map. Dual: V M V Ωα = λα with λ = 1/γ, M V = I N T N V 1 N 1 T NV weighted centering matrix, N Ω = [Ω ij ] kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i, x j ) Taking V = D 1 with degree matrix D = diag{d 1,...,d N } and d i = N j=1 Ω ij relates to random walks algorithm. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 27
42 Kernel Spectral Clustering (KSC): case of two clusters Primal problem: training on given data {x i } N i=1 1 min w,b,e 2 wt w γ 1 2 et V e subject to e i = w T ϕ(x i ) + b, i = 1,...,N with weighting matrix V and ϕ( ) : R d R h the feature map. Dual: V M V Ωα = λα with λ = 1/γ, M V = I N T N V 1 N 1 T NV weighted centering matrix, N Ω = [Ω ij ] kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i, x j ) Taking V = D 1 with degree matrix D = diag{d 1,...,d N } and d i = N j=1 Ω ij relates to random walks algorithm. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 27
43 Kernel spectral clustering: more clusters Case of k clusters: additional sets of constraints 1 min w (l),e (l),b l 2 k 1 l=1 w (l)t w (l) 1 2 k 1 l=1 γ l e (l)t D 1 e (l) subject to e (1) = Φ N nh w (1) + b 1 1 N e (2) = Φ N nh w (2) + b 2 1 N. e (k 1) = Φ N nh w (k 1) + b k 1 1 N where e (l) = [e (l) 1 ;...;e(l) N ] and Φ N n h = [ϕ(x 1 ) T ;...;ϕ(x N ) T ] R N n h. Dual problem: M D Ωα (l) = λdα (l), l = 1,...,k 1. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 28
44 Primal and dual model representations k clusters k 1 sets of constraints (index l = 1,...,k 1) M ր ց (P) : sign[ê (l) ] = sign[w (l)t ϕ(x ) + b l ] (D) : sign[ê (l) ] = sign[ j α(l) j K(x,x j ) + b l ] Kernel methods for complex networks and big data  Johan Suykens 29
45 Advantages of kernelbased setting modelbased approach outofsample extensions, applying model to new data consider training, validation and test data (training problem corresponds to eigenvalue decomposition problem) model selection procedures sparse representations and large scale methods Kernel methods for complex networks and big data  Johan Suykens 30
46 Outofsample extension and coding x (2) 0 2 x (2) x (1) x (1) Kernel methods for complex networks and big data  Johan Suykens 31
47 Outofsample extension and coding x (2) 0 2 x (2) x (1) x (1) Kernel methods for complex networks and big data  Johan Suykens 31
48 Model selection: toy example e (2) i,val σ 2 = 0.5, BLF = e (2) i,val e (1) i,val σ 2 = 0.16, BLF = e (1) i,val validation set x (2) x (2) x (1) x (1) train + validation + test data BAD GOOD Kernel methods for complex networks and big data  Johan Suykens 32
49 Example: image segmentation i,val e (3) e (2) i,val e (1) i,val Kernel methods for complex networks and big data  Johan Suykens 33
50 Kernel spectral clustering: sparse kernel models original image binary clustering Incomplete Cholesky decomposition: Ω GG T with G R N R and R N Image (Berkeley image dataset): (154, 401 pixels), 175 SV Timecomplexity O(R 2 N 2 ) in [Alzate & Suykens, 2008] Timecomplexity O(R 2 N) in [Novak, Alzate, Langone, Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 34
51 Kernel spectral clustering: sparse kernel models original image sparse kernel model Incomplete Cholesky decomposition: Ω GG T with G R N R and R N Image (Berkeley image dataset): (154, 401 pixels), 175 SV Timecomplexity O(R 2 N 2 ) in [Alzate & Suykens, 2008] Timecomplexity O(R 2 N) in [Novak, Alzate, Langone, Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 34
52 Incomplete Cholesky decomposition and reduced set For KSC problem M D Ωα = λdα, solve the approximation U T M D UΛ 2 ζ = λζ from Ω GG T, singular value decomposition G = UΛV T and ζ = U T α. A smaller matrix of size R R is obtained instead of N N. Pivots are used as subset { x i } for the data Reduced set method [Scholkopf et al., 1999]: approximation of w = N i=1 α iϕ(x i ) by w = M j=1 β jϕ( x j ) in the sense min w w 2 2 β j β j Sparser solutions by adding l 1 penalty, reweighted l 1 or group Lasso. [Alzate & Suykens, 2008, 2011; Mall & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 35
53 Incomplete Cholesky decomposition and reduced set For KSC problem M D Ωα = λdα, solve the approximation U T M D UΛ 2 ζ = λζ from Ω GG T, singular value decomposition G = UΛV T and ζ = U T α. A smaller matrix of size R R is obtained instead of N N. Pivots are used as subset { x i } for the data Reduced set method [Scholkopf et al., 1999]: approximation of w = N i=1 α iϕ(x i ) by w = M j=1 β jϕ( x j ) in the sense min β w w ν j β j Sparser solutions by adding l 1 penalty, reweighted l 1 or group Lasso. [Alzate & Suykens, 2008, 2011; Mall & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 35
54 Big data: representative subsets using knn graphs Steps: 1. MapReduce for representative subsets using knn graphs 2. Subset selection using FURS [Mall et al., SNAM 2013] (which aims for good modelling of dense regions; deterministic method) 3. Use in Kernel Spectral Clustering KSC Dataset Points M Random Rènyi entropy FURS DB SIL DB SIL DB SIL 4G 28, ± ± ± ± G 81, ± ± ± ± Batch 13, ± ± ± ± House 34, ± ± ± ± Mopsi Finland 13, ± ± ± ± DB DB DB KDDCupBio 145, ± ± Power 2,049,280 2, ± ± (quality metrics: DaviesBouldin (DB) and silhouette (SIL)) [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 36
55 Semisupervised learning using KSC (1) N unlabeled data, but additional labels on M N data X = {x 1,...,x N,x N+1,...,x M } Kernel spectral clustering as core model (binary case [Alzate & Suykens, WCCI 2012], multiway/multiclass [Mehrkanoon et al., TNNLS in press]) min w,e,b 1 2 wt w γ 1 2 et D 1 e + ρ 1 2 M m=n+1 subject to e i = w T ϕ(x i ) + b, i = 1,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Suitable for clustering as well as classification. Other approaches in semisupervised learning and manifold learning, e.g. [Belkin et al., 2006] Kernel methods for complex networks and big data  Johan Suykens 37
56 Semisupervised learning using KSC (2) Dataset size n L /n U test (%) FS semiksc RD semiksc LapSVMp Spambase / (20%) ± ± ± 0.03 Satimage / (20%) ± ± ± Ring / (20%) ± ± ± Magic / (20%) ± ± ± Codrna / (20%) ± ± ± Covertype / (5%) ± ± ± / ± ± / ± ± / ± ± 0.06 FS semiksc: Fixedsize semisupervised KSC RD semiksc: other subset selection related to [Lee & Mangasarian, 2001] LapSVM: Laplacian support vector machine [Belkin et al., 2006] [Mehrkanoon & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 38
57 Semisupervised learning using KSC (3) original image KSC given a few labels semisupervised KSC [Mehrkanoon, Alzate, Mall, Langone, Suykens, IEEETNNLS, in press] Kernel methods for complex networks and big data  Johan Suykens 39
58 Fixedkernel methods for complex networks Kernel methods for complex networks and big data  Johan Suykens 39
59 Modularity and community detection Modularity for twogroup case [Newman, 2006]: Q = 1 4m i,j (A ij d id j 2m )q iq j with A adjacency matrix, d i degree of node i, m = 1 2 i d i, q i = 1 if node i belongs to group 1 and q i = 1 for group 2. Use of modularity within kernel spectral clustering [Langone et al., 2012]: use modularity at the level of model validation finding representative subgraph using a fixedsize method by maximizing the expansion factor [Maiya, 2010] N(G) G with a subgraph G and its neighborhood N(G). definition data in unweighted networks: x i = A(:, i); use of a community kernel function [Kang, 2009]. Kernel methods for complex networks and big data  Johan Suykens 40
60 Protein interaction network (1) Pajek Yeast interaction network: 2114 nodes, 4480 edges [Barabasi et al., 2001] Kernel methods for complex networks and big data  Johan Suykens 41
61 Protein interaction network (2)  Yeast interaction network: 2114 nodes, 4480 edges [Barabasi et al., 2001]  KSC community detection, representative subgraph [Langone et al., 2012] 7 detected clusters modularity number of clusters Kernel methods for complex networks and big data  Johan Suykens 42
62 Power grid network (1) Pajek Western USA power grid: 4941 nodes, 6594 edges [Watts & Strogatz, 1998] Kernel methods for complex networks and big data  Johan Suykens 43
63 Power grid network (2)  Western USA power grid: 4941 nodes, 6594 edges [Watts & Strogatz, 1998]  KSC community detection, representative subgraph [Langone et al., 2012] 16 detected clusters modularity number of clusters Kernel methods for complex networks and big data  Johan Suykens 44
64 KSC for big data networks (1) YouTube Network: YouTube social network where users form friendship with each other and the users can create groups where others can join. RoadCA Network: a road network of California. Intersections and endpoints are represented by nodes and the roads connecting these intersections are represented as edges. Livejournal Network: free online social network where users are bloggers and they declare friendship among themselves. Dataset Nodes Edges YouTube 1,134,890 2,987,624 roadca 1,965,206 5,533,214 Livejournal 3,997,962 34,681,189 [Mall, Langone, Suykens, Entropy, special issue Big data, 2013] Kernel methods for complex networks and big data  Johan Suykens 45
65 BAFKSC: KSC for big data networks (2) Select representative training subgraph using FURS (FURS = Fast and Unique Representative Subset selection [Mall et al., 2013]: selects nodes with high degree centrality belonging to different dense regions, using a deactivation and activation procedure) Perform model selection using BAF (BAF = Balanced Angular Fit: makes use of a cosine similarity measure related to the eprojection values on validation nodes) Train the KSC model by solving a small eigenvalue problem of size min(0.15n, 5000) 2 Apply outofsample extension to find cluster memberships of the remaining nodes [Mall, Langone, Suykens, Entropy, special issue Big data, 2013] Kernel methods for complex networks and big data  Johan Suykens 46
66 KSC for big data networks (3) BAFKSC [Mall, Langone, Suykens, 2013] Louvain [Blondel et al., 2008] Infomap [Lancichinetti, Fortunato, 2009] CNM [Clauset, Newman, Moore, 2004] Dataset BAFKSC Louvain Infomap CNM Cl Q Con Cl Q Con Cl Q Con Cl Q Con Openflight PGPnet Metabolic HepTh HepPh Enron Epinion Condmat Flight network (Openflights), network based on trust (PGPnet), biological network (Metabolic), citation networks (HepTh, HepPh), communication network (Enron), review based network (Epinion), collaboration network (Condmat) [snap.stanford.edu] Cl = Clusters, Q = modularity, Con = Conductance BAFKSC usually finds a smaller number of clusters and achieves lower conductance Kernel methods for complex networks and big data  Johan Suykens 47
67 Multilevel Hierarchical KSC for complex networks (1) Generating a series of affinity matrices over different levels: communities at level h become nodes for next level h + 1 Kernel methods for complex networks and big data  Johan Suykens 48
68 Multilevel Hierarchical KSC for complex networks (2) Start at ground level 0 and compute cosine similarities on validation nodes S (0) val (i, j) = 1 et i e j/( e i e j ) Select projections e j satisfying S (0) val (i, j) < t(0) with t (0) a threshold value. Keep these nodes as the first cluster at level 0 and remove the nodes from matrix S (0) val to obtain a reduced matrix. Iterative procedure over different levels h: Communities at level h become nodes for the next level h+1, by creating a set of matrices at different levels of hierarchy: S (h) val (i, j) = 1 C (h 1) i C (h 1) j m C (h 1) i l C (h 1) j S (h 1) val (m, l) and working with suitable threshold values t (h) for each level. [Mall, Langone, Suykens, PLOS ONE, 2014] Kernel methods for complex networks and big data  Johan Suykens 49
69 Multilevel Hierarchical KSC for complex networks (3) MHKSC on PGP network: fine intermediate intermediate coarse Multilevel Hierarchical KSC finds high quality clusters at coarse as well as fine and intermediate levels of hierarchy. [Mall, Langone, Suykens, PLOS ONE, 2014] Kernel methods for complex networks and big data  Johan Suykens 50
70 Multilevel Hierarchical KSC for complex networks (4) Louvain Infomap OSLOM Louvain, Infomap, and OSLOM seem biased toward a particular scale in comparison with MHKSC, based upon ARI, V I, Q metrics Kernel methods for complex networks and big data  Johan Suykens 51
71 Conclusions Synergies parametric and kernel basedmodelling Sparse kernel models using fixedsize method Supervised, semisupervised and unsupervised learning applications Large scale complex networks and big data Software: see ERC AdG ADATADRIVEB website Kernel methods for complex networks and big data  Johan Suykens 52
72 Acknowledgements (1) Current and former coworkers at ESATSTADIUS: M. Agudelo, C. Alzate, A. Argyriou, R. Castro, M. Claesen, A. Daemen, J. De Brabanter, K. De Brabanter, J. de Haas, L. De Lathauwer, B. De Moor, F. De Smet, M. Espinoza, T. Falck, Y. Feng, E. Frandi, D. Geebelen, O. Gevaert, L. Houthuys, X. Huang, B. Hunyadi, A. Installe, V. Jumutc, Z. Karevan, P. Karsmakers, R. Langone, Y. Liu, X. Lou, R. Mall, S. Mehrkanoon, Y. Moreau, M. Novak, F. Ojeda, K. Pelckmans, N. Pochet, D. Popovic, J. Puertas, M. Seslija, L. Shi, M. Signoretto, P. Smyth, Q. Tran Dinh, V. Van Belle, T. Van Gestel, J. Vandewalle, S. Van Huffel, C. Varon, X. Xi, Y. Yang, S. Yu, and others Many others for joint work, discussions, invitations, organizations Support from ERC AdG ADATADRIVEB, KU Leuven, GOAMaNet, OPTEC, IUAP DYSCO, FWO projects, IWT, iminds, BIL, COST Kernel methods for complex networks and big data  Johan Suykens 53
73 Acknowledgements (2) Kernel methods for complex networks and big data  Johan Suykens 54
74 Thank you Kernel methods for complex networks and big data  Johan Suykens 55
Kernel Spectral Clustering for Big Data Networks
Entropy 2013, 15, 15671586; doi:10.3390/e15051567 Article Kernel Spectral Clustering for Big Data Networks Raghvendra Mall *, Rocco Langone and Johan A.K. Suykens OPEN ACCESS entropy ISSN 10994300 www.mdpi.com/journal/entropy
More informationRepresentative Subsets For Big Data Learning using
Representative Subsets For Big Data Learning using knn Graphs Raghvendra Mall, Vilen Jumutc, Rocco Langone, Johan A.K. Suykens KU Leuven, ESAT/STADIUS Kasteelpark Arenberg 10, B3001 Leuven, Belgium {raghvendra.mall,vilen.jumutc,rocco.langone,johan.suykens}@esat.kuleuven.be
More informationLargescale Machine Learning in Distributed Environments
Largescale Machine Learning in Distributed Environments ChihJen Lin National Taiwan University ebay Research Labs Tutorial at ACM ICMR, June 5, 2012 ChihJen Lin (National Taiwan Univ.) 1 / 105 Outline
More informationSAGA: Sparse And GeometryAware nonnegative matrix factorization through nonlinear local embedding
SAGA: Sparse And GeometryAware nonnegative matrix factorization through nonlinear local embedding Nicolas Courty, Xing Gong, Jimmy Vandel, Thomas Burger To cite this version: Nicolas Courty, Xing Gong,
More informationClassification Techniques for Remote Sensing
Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551
More informationHashing with Graphs. wliu@ee.columbia.edu Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
Wei Liu wliu@ee.columbia.edu Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Jun Wang wangjun@us.ibm.com IBM T. J. Watson Research Center, Yorktown Heights, NY 10598,
More informationIEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1 On Understanding Big Data Impacts in Remotely Sensed Image Classification Using Support Vector Machine Methods Gabriele
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationEfficient Projections onto the l 1 Ball for Learning in High Dimensions
Efficient Projections onto the l 1 Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai ShalevShwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar
More informationGRAPHBASED ranking models have been deeply
EMR: A Scalable Graphbased Ranking Model for Contentbased Image Retrieval Bin Xu, Student Member, IEEE, Jiajun Bu, Member, IEEE, Chun Chen, Member, IEEE, Can Wang, Member, IEEE, Deng Cai, Member, IEEE,
More informationSpectral Analysis of Signed Graphs for Clustering, Prediction and Visualization
Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization Jérôme Kunegis Stephan Schmidt Andreas Lommatzsch Jürgen Lerner Ernesto W. De Luca Sahin Albayrak Abstract We study the application
More informationData Mining Based Social Network Analysis from Online Behaviour
Data Mining Based Social Network Analysis from Online Behaviour Jaideep Srivastava, Muhammad A. Ahmad, Nishith Pathak, David KuoWei Hsu University of Minnesota 02/10/08 University of Minnesota 1 Outline
More informationAnalyzing Communities and Their Evolutions in Dynamic Social Networks
Analyzing Communities and Their Evolutions in Dynamic Social Networks YURU LIN Arizona State University YUN CHI and SHENGHUO ZHU NEC Laboratories America HARI SUNDARAM Arizona State University and BELLE
More informationCase Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
More informationMultiClass Active Learning for Image Classification
MultiClass Active Learning for Image Classification Ajay J. Joshi University of Minnesota Twin Cities ajay@cs.umn.edu Fatih Porikli Mitsubishi Electric Research Laboratories fatih@merl.com Nikolaos Papanikolopoulos
More informationFlexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
More informationTop 10 algorithms in data mining
Knowl Inf Syst (2008) 14:1 37 DOI 10.1007/s1011500701142 SURVEY PAPER Top 10 algorithms in data mining Xindong Wu Vipin Kumar J. Ross Quinlan Joydeep Ghosh Qiang Yang Hiroshi Motoda Geoffrey J. McLachlan
More informationTree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B4000
More informationIn Defense of OneVsAll Classification
Journal of Machine Learning Research 5 (2004) 101141 Submitted 4/03; Revised 8/03; Published 1/04 In Defense of OneVsAll Classification Ryan Rifkin Honda Research Institute USA 145 Tremont Street Boston,
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationMaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff
MaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff Spemannstraße 38 7276 Tübingen Germany Technical Report No. 44 December 996 Nonlinear Component Analysis as a Kernel Eigenvalue Problem
More informationData Mining Applications of Singular Value Decomposition
Data Mining Applications of Singular Value Decomposition Miklós Kurucz Supervisor: András A. Benczúr Ph.D. Eötvös Loránd University Faculty of Informatics Department of Information Sciences Informatics
More informationResearch Article Sales Growth Rate Forecasting Using Improved PSO and SVM
Mathematical Problems in Engineering, Article ID 437898, 13 pages http://dx.doi.org/1.1155/214/437898 Research Article Sales Growth Rate Forecasting Using Improved PSO and SVM Xibin Wang, 1 Junhao Wen,
More informationThe Bigraphical Lasso
Alfredo Kalaitzis a.kalaitzis@ucl.ac.uk Department of Statistical Science, University College London, Gower Street, London WCE 6BT, UK John Lafferty lafferty@galton.uchicago.edu Department of Statistics,
More informationAdaptive and Online OneClass Support Vector Machinebased Outlier Detection Techniques for Wireless Sensor Networks
2009 International Conference on Advanced Information Networking and Applications Workshops Adaptive and Online OneClass Support Vector Machinebased Outlier Detection Techniques for Wireless Sensor Networks
More informationA MapReduce based distributed SVM algorithm for binary classification
A MapReduce based distributed SVM algorithm for binary classification Ferhat Özgür Çatak 1, Mehmet Erdal Balaban 2 1 National Research Institute of Electronics and Cryptology, TUBITAK, Turkey, Tel: 02626481070,
More informationRegression. Chapter 2. 2.1 Weightspace View
Chapter Regression Supervised learning can be divided into regression and classification problems. Whereas the outputs for classification are discrete class labels, regression is concerned with the prediction
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationUncovering the Small Community Structure in Large Networks: A Local Spectral Approach
Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li Cornell Unversity yli@cs.cornell.edu Kun He Huazhong University of Science and Technology brooklet60@hust.edu.cn
More informationLearning to Find PreImages
Learning to Find PreImages Gökhan H. Bakır, Jason Weston and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstraße 38, 72076 Tübingen, Germany {gb,weston,bs}@tuebingen.mpg.de
More information