Kernel methods for complex networks and big data


 Jack Kevin Mosley
 1 years ago
 Views:
Transcription
1 Kernel methods for complex networks and big data Johan Suykens KU Leuven, ESATSTADIUS Kasteelpark Arenberg 10 B3001 Leuven (Heverlee), Belgium STATLEARN 2015, Grenoble Kernel methods for complex networks and big data  Johan Suykens
2 Introduction and motivation Kernel methods for complex networks and big data  Johan Suykens
3 Data world (a) C C C 3 C 4 Kernel methods for complex networks and big data  Johan Suykens 1
4 Challenges datadriven general methodology scalability need for new mathematical frameworks Kernel methods for complex networks and big data  Johan Suykens 2
5 Sparsity big data requires sparsity Kernel methods for complex networks and big data  Johan Suykens 3
6 Different paradigms SVM & Kernel methods Convex Optimization Sparsity & Compressed sensing Kernel methods for complex networks and big data  Johan Suykens 4
7 Different paradigms SVM & Kernel methods Convex Optimization? Sparsity & Compressed sensing Kernel methods for complex networks and big data  Johan Suykens 4
8 Sparsity through regularization or loss function Kernel methods for complex networks and big data  Johan Suykens 4
9 Sparsity: through regularization or loss function through regularization: model ŷ = w T x + b min j w j + γ i e 2 i sparse w through loss function: model ŷ = i α ik(x, x i ) + b min w T w + γ i L(e i ) sparse α ε 0 +ε Kernel methods for complex networks and big data  Johan Suykens 5
10 Function estimation in RKHS Find function f such that min f H 1 N N L(y i, f(x i )) + λ f 2 K i=1 with L(, ) the loss function. f K is norm in RKHS H defined by K. Representer theorem: for convex loss function, solution of the form f(x) = NX α i K(x, x i ) i=1 Reproducing property f(x) = f, K x K with K x ( ) = K(x, ) [Wahba, 1990; Poggio & Girosi, 1990; Evgeniou et al., 2000] Kernel methods for complex networks and big data  Johan Suykens 6
11 Learning with primal and dual model representations Kernel methods for complex networks and big data  Johan Suykens 6
12 Learning models from data: alternative views  Consider model ŷ = f(x; w), given input/output data {(x i, y i )} N i=1 : min w wt w + γ N (y i f(x i ;w)) 2 i=1  Rewrite the problem as min w,e w T w + γ N i=1 (y i f(x i ;w)) 2 subject to e i = y i f(x i ;w), i = 1,...,N  Construct Lagrangian and take condition for optimality  For a model f(x;w) = h j=1 w jϕ j (x) = w T ϕ(x) one obtains then ˆf(x) = N i=1 α ik(x, x i ) with K(x, x i ) = ϕ(x) T ϕ(x i ). Kernel methods for complex networks and big data  Johan Suykens 7
13 Learning models from data: alternative views  Consider model ŷ = f(x; w), given input/output data {(x i, y i )} N i=1 : min w wt w + γ N (y i f(x i ; w)) 2 i=1  Rewrite the problem as min w,e w T w + γ N i=1 e2 i subject to e i = y i f(x i ;w),i = 1,...,N  Express the solution and the model in terms of Lagrange multipliers α i  For a model f(x;w) = h j=1 w jϕ j (x) = w T ϕ(x) one obtains then ˆf(x) = N i=1 α ik(x, x i ) with K(x, x i ) = ϕ(x) T ϕ(x i ). Kernel methods for complex networks and big data  Johan Suykens 7
14 Least Squares Support Vector Machines: core models Regression Classification min w,b,e wt w + γ i min w,b,e wt w + γ i e 2 i e 2 i s.t. y i = w T ϕ(x i ) + b + e i, i s.t. y i (w T ϕ(x i ) + b) = 1 e i, i Kernel pca (V = I), Kernel spectral clustering (V = D 1 ) min w,b,e wt w + γ i v i e 2 i s.t. e i = w T ϕ(x i ) + b, i Kernel canonical correlation analysis/partial least squares min w,v,b,d,e,r wt w + v T v + ν i (e i r i ) 2 s.t. { ei = w T ϕ (1) (x i ) + b = v T ϕ (2) (y i ) + d r i [Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010] Kernel methods for complex networks and big data  Johan Suykens 8
15 Probability and quantum mechanics Kernel pmf estimation Primal: 1 min w,p 2 w, w subject to p i = w,ϕ(x i ), i = 1,...,N and N i=1 p i = 1 i Dual: p i = P Nj=1 K(x j,x i ) P Ni=1 P Nj=1 K(x j,x i ) Quantum measurement: state vector ψ, measurement operators M i Primal: 1 min w,p 2 w w subject to p i = Re( w M i ψ ), i = 1,...,N and N i=1 p i = 1 i Dual: p i = ψ M i ψ (Born rule, orthogonal projective measurement) [Suykens, Physical Review A, 2013] Kernel methods for complex networks and big data  Johan Suykens 9
16 SVMs: living in two worlds... Primal space x x x x x o o o o Feature space ϕ(x) x x x o oo x x o Input space Parametric Dual space ŷ = sign[w T ϕ(x) + b] Nonparametric ŷ K(x i, x j ) = ϕ(x i ) T ϕ(x j ) (Mercer) ŷ = sign[ P #sv i=1 α iy i K(x, x i ) + b] ŷ w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x α #sv K(x, x #sv ) Kernel methods for complex networks and big data  Johan Suykens 10
17 SVMs: living in two worlds... Primal space x x x x x o o o o Feature space ϕ(x) x x x o oo x x o Input space Parametric Dual space ŷ = sign[w T ϕ(x) + b] Nonparametric ŷ K(x i, x j ) = ϕ(x i ) T ϕ(x j ) ( Kernel trick ) ŷ = sign[ P #sv i=1 α iy i K(x, x i ) + b] ŷ w 1 w nh α 1 ϕ 1 (x) ϕ nh (x) K(x, x 1 ) x x Parametric Non parametric α #sv K(x, x #sv ) Kernel methods for complex networks and big data  Johan Suykens 10
18 Linear model: solving in primal or dual? inputs x R d, output y R training set {(x i, y i )} N i=1 Model ր ց (P) : ŷ = w T x + b, w R d (D) : ŷ = i α i x T i x + b, α RN Kernel methods for complex networks and big data  Johan Suykens 11
19 Linear model: solving in primal or dual? inputs x R d, output y R training set {(x i, y i )} N i=1 Model ր ց (P) : ŷ = w T x + b, w R d (D) : ŷ = i α i x T i x + b, α RN Kernel methods for complex networks and big data  Johan Suykens 11
20 Linear model: solving in primal or dual? few inputs, many data points: d N primal : w R d dual: α R N (large kernel matrix: N N) Kernel methods for complex networks and big data  Johan Suykens 12
21 Linear model: solving in primal or dual? many inputs, few data points: d N primal: w R d dual : α R N (small kernel matrix: N N) Kernel methods for complex networks and big data  Johan Suykens 12
22 From linear to nonlinear model: Feature map and kernel Model ր ց (P) : (D) : ŷ = w T ϕ(x) + b ŷ = i α i K(x i, x) + b Mercer theorem: K(x, z) = ϕ(x) T ϕ(z) Feature map ϕ(x) = [ϕ 1 (x); ϕ 2 (x);...;ϕ h (x)] Kernel function K(x, z) (e.g. linear, polynomial, RBF,...) Use of feature map and positive definite kernel [Cortes & Vapnik, 1995] Optimization in infinite dimensional spaces for LSSVM [Signoretto, De Lathauwer, Suykens, 2011] Kernel methods for complex networks and big data  Johan Suykens 13
23 Sparsity by fixedsize kernel method Kernel methods for complex networks and big data  Johan Suykens 13
24 Fixedsize method: steps 1. selection of a subset from the data 2. kernel matrix on the subset 3. eigenvalue decomposition of kernel matrix 4. approximation of the feature map based on the eigenvectors (Nyström approximation) 5. estimation of the model in the primal using the approximate feature map (applicable to large data sets) [Suykens et al., 2002] (lssvm book) Kernel methods for complex networks and big data  Johan Suykens 14
25 Selection of subset random quadratic Renyi entropy Kernel methods for complex networks and big data  Johan Suykens 15
26 Subset selection using quadratic Renyi entropy: example Kernel methods for complex networks and big data  Johan Suykens 16
27 Fixedsize method: using quadratic Renyi entropy Algorithm: 1. Given N training data 2. Choose a working set with fixed size M (i.e. M support vectors) (typically M N). 3. Randomly select a SV x from the working set of M support vectors. 4. Randomly select a point x t from the training data and replace x by x t. If the entropy increases by taking the point x t instead of x then this point x t is accepted for the working set of M SVs, otherwise the point x t is rejected (and returned to the training data pool) and the SV x stays in the working set. 5. Calculate the entropy value for the present working set. The quadratic Renyi entropy equals H R = log 1 P M 2 ij Ω (M,M) ij. [Suykens et al., 2002] Kernel methods for complex networks and big data  Johan Suykens 17
28 Nyström method big kernel matrix: Ω (N,N) R N N small kernel matrix: Ω (M,M) R M M (on subset) Eigenvalue decompositions: Ω (N,N) Ũ = Ũ Λ and Ω (M,M) U = U Λ Relation to eigenvalues and eigenfunctions of the integral equation K(x, x )φ i (x)p(x)dx = λ i φ i (x ) with ˆλ i = 1 M λ i, ˆφi (x k ) = M u ki, ˆφi (x ) = M λ i M u ki K(x k,x ) k=1 [Williams & Seeger, 2001] (Nyström method in GP) Kernel methods for complex networks and big data  Johan Suykens 18
29 Fixedsize method: estimation in primal For the feature map ϕ( ) : R d R h obtain an approximation ϕ( ) : R d R M based on the eigenvalue decomposition of the kernel matrix with ϕ i (x ) = ˆλi ˆφi (x ) (on a subset of size M N). Estimate in primal: min w, b 1 2 wt w + γ 1 2 N (y i w T ϕ(x i ) b) 2 i=1 Sparse representation is obtained: w R M with M N and M h. [Suykens et al., 2002; De Brabanter et al., CSDA 2010] Kernel methods for complex networks and big data  Johan Suykens 19
30 Fixedsize method: performance in classification pid spa mgt adu ftc N N cv N test d FSLSSVM (# SV) CSVM (# SV) νsvm (# SV) RBF FSLSSVM 76.7(3.43) 92.5(0.67) 86.6(0.51) 85.21(0.21) 81.8(0.52) Lin FSLSSVM 77.6(0.78) 90.9(0.75) 77.8(0.23) 83.9(0.17) 75.61(0.35) RBF CSVM 75.1(3.31) 92.6(0.76) 85.6(1.46) 84.81(0.20) 81.5(no cv) Lin CSVM 76.1(1.76) 91.9(0.82) 77.3(0.53) 83.5(0.28) 75.24(no cv) RBF νsvm 75.8(3.34) 88.7(0.73) 84.2(1.42) 83.9(0.23) 81.6(no cv) Maj. Rule 64.8(1.46) 60.6(0.58) 65.8(0.28) 83.4(0.1) 51.23(0.20) Fixedsize (FS) LSSVM: good performance and sparsity wrt CSVM and νsvm Challenging to achieve high performance by very sparse models [De Brabanter et al., CSDA 2010] Kernel methods for complex networks and big data  Johan Suykens 20
31 Two stages of sparsity stage 1 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
32 Two stages of sparsity stage 1 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
33 Two stages of sparsity stage 1 stage 2 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
34 Two stages of sparsity stage 1 stage 2 primal FS model estimation reweighted l 1 dual subset selection Nyström approximation Synergy between parametric & kernelbased models [Mall & Suykens, IEEETNNLS, in press], reweighted l 1 [Candes et al., 2008] Other possible approaches with improved sparsity: SCAD [Fan & Li, 2001]; coefficientbased l q (0 < q 1) [Shi et al., 2013]; twolevel l 1 [Huang et al., 2014] Kernel methods for complex networks and big data  Johan Suykens 21
35 Big data: representative subsets using knn graphs (1) Convert the large scale dataset into a sparse undirected knn graph using a distributed network generation framework Julia language (http://julialang.org/) Large N N kernel matrix Ω for data set D with N data points Batch clusterbased approach: a batch subset D p D is loaded per node with P p=1d p = D; related matrix slice X p and Ω p. MapReduce and AllReduce settings implementable using Hadoop or Spark (see also [Agarwal et al., JMLR 2014] Terascale linear learning) Computational complexity: complexity for construction of the kernel matrix reduced from O(N 2 ) to O(N 2 (1 + log N)/P) for P nodes [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 22
36 Big data: representative subsets using knn graphs (2) Map: for Silverman s Rule of Thumb, compute mean and standard deviation of the data per node; compute slice Ω p ; sort in ascending order the columns of Ω p (sortperm in Julia); pick indices for top k values. Reduce: merge knn subgraphs into aggregated knn graph [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 23
37 Big data: representative subsets using knn graphs (3) Steps: 1. MapReduce for representative subsets using knn graphs 2. Subset selection using FURS [Mall et al., SNAM 2013] (which aims for good modelling of dense regions; deterministic method) 3. Use in fixedsize LSSVM ([Mall & Suykens, IEEETNNLS, in press]) Dataset N SR SRE FURS k=10 FURS k=100 FURS k=500 Generalization or Test Error ± Standard Deviation 4G 28, ± ± ± ± ± G 81, ± ± ± ± ±0.009 Magic 19, ± ± ± ± ±3.062 Shuttle 58, ± ± ± ± ±0.715 Skin 245, ± ± ± ± ±0.080 SR: stratified random sampling; SRE: stratified Renyi entropy FURS k=10 : FURS based on knn graphs with k = 10 [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 24
38 Kernelbased models for spectral clustering Kernel methods for complex networks and big data  Johan Suykens 24
39 Spectral graph clustering (1) Minimal cut: given the graph G = (V,E), find clusters A 1, A 2 min q i { 1,+1} 1 2 w ij (q i q j ) 2 i,j with cluster membership indicator q i (q i = 1 if i A 1, q i = 1 if i A 2 ) and W = [w ij ] the weighted adjacency matrix cut of size 1 (minimal cut) 6 cut of size 2 Kernel methods for complex networks and big data  Johan Suykens 25
40 Spectral graph clustering (2) Mincut spectral clustering problem min q T q=1 q T L q with L = D W the unnormalized graph Laplacian, degree matrix D = diag(d 1,...,d N ), d i = j w ij, giving L q = λ q. Cluster member indicators: ˆq i = sign( q i θ) with threshold θ. Normalized cut L q = λd q [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997; von Luxburg, 2007] Discrete version to continuous problem (Laplace operator) [Belkin & Niyogi, 2003; von Luxburg et al., 2008; Smale & Zhou, 2007] Kernel methods for complex networks and big data  Johan Suykens 26
41 Kernel Spectral Clustering (KSC): case of two clusters Primal problem: training on given data {x i } N i=1 1 min w,b,e 2 wt w γ 1 2 et V e subject to e i = w T ϕ(x i ) + b, i = 1,...,N with weighting matrix V and ϕ( ) : R d R h the feature map. Dual: V M V Ωα = λα with λ = 1/γ, M V = I N T N V 1 N 1 T NV weighted centering matrix, N Ω = [Ω ij ] kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i, x j ) Taking V = D 1 with degree matrix D = diag{d 1,...,d N } and d i = N j=1 Ω ij relates to random walks algorithm. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 27
42 Kernel Spectral Clustering (KSC): case of two clusters Primal problem: training on given data {x i } N i=1 1 min w,b,e 2 wt w γ 1 2 et V e subject to e i = w T ϕ(x i ) + b, i = 1,...,N with weighting matrix V and ϕ( ) : R d R h the feature map. Dual: V M V Ωα = λα with λ = 1/γ, M V = I N T N V 1 N 1 T NV weighted centering matrix, N Ω = [Ω ij ] kernel matrix with Ω ij = ϕ(x i ) T ϕ(x j ) = K(x i, x j ) Taking V = D 1 with degree matrix D = diag{d 1,...,d N } and d i = N j=1 Ω ij relates to random walks algorithm. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 27
43 Kernel spectral clustering: more clusters Case of k clusters: additional sets of constraints 1 min w (l),e (l),b l 2 k 1 l=1 w (l)t w (l) 1 2 k 1 l=1 γ l e (l)t D 1 e (l) subject to e (1) = Φ N nh w (1) + b 1 1 N e (2) = Φ N nh w (2) + b 2 1 N. e (k 1) = Φ N nh w (k 1) + b k 1 1 N where e (l) = [e (l) 1 ;...;e(l) N ] and Φ N n h = [ϕ(x 1 ) T ;...;ϕ(x N ) T ] R N n h. Dual problem: M D Ωα (l) = λdα (l), l = 1,...,k 1. [Alzate & Suykens, IEEEPAMI, 2010] Kernel methods for complex networks and big data  Johan Suykens 28
44 Primal and dual model representations k clusters k 1 sets of constraints (index l = 1,...,k 1) M ր ց (P) : sign[ê (l) ] = sign[w (l)t ϕ(x ) + b l ] (D) : sign[ê (l) ] = sign[ j α(l) j K(x,x j ) + b l ] Kernel methods for complex networks and big data  Johan Suykens 29
45 Advantages of kernelbased setting modelbased approach outofsample extensions, applying model to new data consider training, validation and test data (training problem corresponds to eigenvalue decomposition problem) model selection procedures sparse representations and large scale methods Kernel methods for complex networks and big data  Johan Suykens 30
46 Outofsample extension and coding x (2) 0 2 x (2) x (1) x (1) Kernel methods for complex networks and big data  Johan Suykens 31
47 Outofsample extension and coding x (2) 0 2 x (2) x (1) x (1) Kernel methods for complex networks and big data  Johan Suykens 31
48 Model selection: toy example e (2) i,val σ 2 = 0.5, BLF = e (2) i,val e (1) i,val σ 2 = 0.16, BLF = e (1) i,val validation set x (2) x (2) x (1) x (1) train + validation + test data BAD GOOD Kernel methods for complex networks and big data  Johan Suykens 32
49 Example: image segmentation i,val e (3) e (2) i,val e (1) i,val Kernel methods for complex networks and big data  Johan Suykens 33
50 Kernel spectral clustering: sparse kernel models original image binary clustering Incomplete Cholesky decomposition: Ω GG T with G R N R and R N Image (Berkeley image dataset): (154, 401 pixels), 175 SV Timecomplexity O(R 2 N 2 ) in [Alzate & Suykens, 2008] Timecomplexity O(R 2 N) in [Novak, Alzate, Langone, Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 34
51 Kernel spectral clustering: sparse kernel models original image sparse kernel model Incomplete Cholesky decomposition: Ω GG T with G R N R and R N Image (Berkeley image dataset): (154, 401 pixels), 175 SV Timecomplexity O(R 2 N 2 ) in [Alzate & Suykens, 2008] Timecomplexity O(R 2 N) in [Novak, Alzate, Langone, Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 34
52 Incomplete Cholesky decomposition and reduced set For KSC problem M D Ωα = λdα, solve the approximation U T M D UΛ 2 ζ = λζ from Ω GG T, singular value decomposition G = UΛV T and ζ = U T α. A smaller matrix of size R R is obtained instead of N N. Pivots are used as subset { x i } for the data Reduced set method [Scholkopf et al., 1999]: approximation of w = N i=1 α iϕ(x i ) by w = M j=1 β jϕ( x j ) in the sense min w w 2 2 β j β j Sparser solutions by adding l 1 penalty, reweighted l 1 or group Lasso. [Alzate & Suykens, 2008, 2011; Mall & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 35
53 Incomplete Cholesky decomposition and reduced set For KSC problem M D Ωα = λdα, solve the approximation U T M D UΛ 2 ζ = λζ from Ω GG T, singular value decomposition G = UΛV T and ζ = U T α. A smaller matrix of size R R is obtained instead of N N. Pivots are used as subset { x i } for the data Reduced set method [Scholkopf et al., 1999]: approximation of w = N i=1 α iϕ(x i ) by w = M j=1 β jϕ( x j ) in the sense min β w w ν j β j Sparser solutions by adding l 1 penalty, reweighted l 1 or group Lasso. [Alzate & Suykens, 2008, 2011; Mall & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 35
54 Big data: representative subsets using knn graphs Steps: 1. MapReduce for representative subsets using knn graphs 2. Subset selection using FURS [Mall et al., SNAM 2013] (which aims for good modelling of dense regions; deterministic method) 3. Use in Kernel Spectral Clustering KSC Dataset Points M Random Rènyi entropy FURS DB SIL DB SIL DB SIL 4G 28, ± ± ± ± G 81, ± ± ± ± Batch 13, ± ± ± ± House 34, ± ± ± ± Mopsi Finland 13, ± ± ± ± DB DB DB KDDCupBio 145, ± ± Power 2,049,280 2, ± ± (quality metrics: DaviesBouldin (DB) and silhouette (SIL)) [Mall, Jumutc, Langone, Suykens, IEEE Bigdata 2014] Kernel methods for complex networks and big data  Johan Suykens 36
55 Semisupervised learning using KSC (1) N unlabeled data, but additional labels on M N data X = {x 1,...,x N,x N+1,...,x M } Kernel spectral clustering as core model (binary case [Alzate & Suykens, WCCI 2012], multiway/multiclass [Mehrkanoon et al., TNNLS in press]) min w,e,b 1 2 wt w γ 1 2 et D 1 e + ρ 1 2 M m=n+1 subject to e i = w T ϕ(x i ) + b, i = 1,...,M (e m y m ) 2 Dual solution is characterized by a linear system. Suitable for clustering as well as classification. Other approaches in semisupervised learning and manifold learning, e.g. [Belkin et al., 2006] Kernel methods for complex networks and big data  Johan Suykens 37
56 Semisupervised learning using KSC (2) Dataset size n L /n U test (%) FS semiksc RD semiksc LapSVMp Spambase / (20%) ± ± ± 0.03 Satimage / (20%) ± ± ± Ring / (20%) ± ± ± Magic / (20%) ± ± ± Codrna / (20%) ± ± ± Covertype / (5%) ± ± ± / ± ± / ± ± / ± ± 0.06 FS semiksc: Fixedsize semisupervised KSC RD semiksc: other subset selection related to [Lee & Mangasarian, 2001] LapSVM: Laplacian support vector machine [Belkin et al., 2006] [Mehrkanoon & Suykens, 2014] Kernel methods for complex networks and big data  Johan Suykens 38
57 Semisupervised learning using KSC (3) original image KSC given a few labels semisupervised KSC [Mehrkanoon, Alzate, Mall, Langone, Suykens, IEEETNNLS, in press] Kernel methods for complex networks and big data  Johan Suykens 39
58 Fixedkernel methods for complex networks Kernel methods for complex networks and big data  Johan Suykens 39
59 Modularity and community detection Modularity for twogroup case [Newman, 2006]: Q = 1 4m i,j (A ij d id j 2m )q iq j with A adjacency matrix, d i degree of node i, m = 1 2 i d i, q i = 1 if node i belongs to group 1 and q i = 1 for group 2. Use of modularity within kernel spectral clustering [Langone et al., 2012]: use modularity at the level of model validation finding representative subgraph using a fixedsize method by maximizing the expansion factor [Maiya, 2010] N(G) G with a subgraph G and its neighborhood N(G). definition data in unweighted networks: x i = A(:, i); use of a community kernel function [Kang, 2009]. Kernel methods for complex networks and big data  Johan Suykens 40
60 Protein interaction network (1) Pajek Yeast interaction network: 2114 nodes, 4480 edges [Barabasi et al., 2001] Kernel methods for complex networks and big data  Johan Suykens 41
61 Protein interaction network (2)  Yeast interaction network: 2114 nodes, 4480 edges [Barabasi et al., 2001]  KSC community detection, representative subgraph [Langone et al., 2012] 7 detected clusters modularity number of clusters Kernel methods for complex networks and big data  Johan Suykens 42
62 Power grid network (1) Pajek Western USA power grid: 4941 nodes, 6594 edges [Watts & Strogatz, 1998] Kernel methods for complex networks and big data  Johan Suykens 43
63 Power grid network (2)  Western USA power grid: 4941 nodes, 6594 edges [Watts & Strogatz, 1998]  KSC community detection, representative subgraph [Langone et al., 2012] 16 detected clusters modularity number of clusters Kernel methods for complex networks and big data  Johan Suykens 44
64 KSC for big data networks (1) YouTube Network: YouTube social network where users form friendship with each other and the users can create groups where others can join. RoadCA Network: a road network of California. Intersections and endpoints are represented by nodes and the roads connecting these intersections are represented as edges. Livejournal Network: free online social network where users are bloggers and they declare friendship among themselves. Dataset Nodes Edges YouTube 1,134,890 2,987,624 roadca 1,965,206 5,533,214 Livejournal 3,997,962 34,681,189 [Mall, Langone, Suykens, Entropy, special issue Big data, 2013] Kernel methods for complex networks and big data  Johan Suykens 45
65 BAFKSC: KSC for big data networks (2) Select representative training subgraph using FURS (FURS = Fast and Unique Representative Subset selection [Mall et al., 2013]: selects nodes with high degree centrality belonging to different dense regions, using a deactivation and activation procedure) Perform model selection using BAF (BAF = Balanced Angular Fit: makes use of a cosine similarity measure related to the eprojection values on validation nodes) Train the KSC model by solving a small eigenvalue problem of size min(0.15n, 5000) 2 Apply outofsample extension to find cluster memberships of the remaining nodes [Mall, Langone, Suykens, Entropy, special issue Big data, 2013] Kernel methods for complex networks and big data  Johan Suykens 46
66 KSC for big data networks (3) BAFKSC [Mall, Langone, Suykens, 2013] Louvain [Blondel et al., 2008] Infomap [Lancichinetti, Fortunato, 2009] CNM [Clauset, Newman, Moore, 2004] Dataset BAFKSC Louvain Infomap CNM Cl Q Con Cl Q Con Cl Q Con Cl Q Con Openflight PGPnet Metabolic HepTh HepPh Enron Epinion Condmat Flight network (Openflights), network based on trust (PGPnet), biological network (Metabolic), citation networks (HepTh, HepPh), communication network (Enron), review based network (Epinion), collaboration network (Condmat) [snap.stanford.edu] Cl = Clusters, Q = modularity, Con = Conductance BAFKSC usually finds a smaller number of clusters and achieves lower conductance Kernel methods for complex networks and big data  Johan Suykens 47
67 Multilevel Hierarchical KSC for complex networks (1) Generating a series of affinity matrices over different levels: communities at level h become nodes for next level h + 1 Kernel methods for complex networks and big data  Johan Suykens 48
68 Multilevel Hierarchical KSC for complex networks (2) Start at ground level 0 and compute cosine similarities on validation nodes S (0) val (i, j) = 1 et i e j/( e i e j ) Select projections e j satisfying S (0) val (i, j) < t(0) with t (0) a threshold value. Keep these nodes as the first cluster at level 0 and remove the nodes from matrix S (0) val to obtain a reduced matrix. Iterative procedure over different levels h: Communities at level h become nodes for the next level h+1, by creating a set of matrices at different levels of hierarchy: S (h) val (i, j) = 1 C (h 1) i C (h 1) j m C (h 1) i l C (h 1) j S (h 1) val (m, l) and working with suitable threshold values t (h) for each level. [Mall, Langone, Suykens, PLOS ONE, 2014] Kernel methods for complex networks and big data  Johan Suykens 49
69 Multilevel Hierarchical KSC for complex networks (3) MHKSC on PGP network: fine intermediate intermediate coarse Multilevel Hierarchical KSC finds high quality clusters at coarse as well as fine and intermediate levels of hierarchy. [Mall, Langone, Suykens, PLOS ONE, 2014] Kernel methods for complex networks and big data  Johan Suykens 50
70 Multilevel Hierarchical KSC for complex networks (4) Louvain Infomap OSLOM Louvain, Infomap, and OSLOM seem biased toward a particular scale in comparison with MHKSC, based upon ARI, V I, Q metrics Kernel methods for complex networks and big data  Johan Suykens 51
71 Conclusions Synergies parametric and kernel basedmodelling Sparse kernel models using fixedsize method Supervised, semisupervised and unsupervised learning applications Large scale complex networks and big data Software: see ERC AdG ADATADRIVEB website Kernel methods for complex networks and big data  Johan Suykens 52
72 Acknowledgements (1) Current and former coworkers at ESATSTADIUS: M. Agudelo, C. Alzate, A. Argyriou, R. Castro, M. Claesen, A. Daemen, J. De Brabanter, K. De Brabanter, J. de Haas, L. De Lathauwer, B. De Moor, F. De Smet, M. Espinoza, T. Falck, Y. Feng, E. Frandi, D. Geebelen, O. Gevaert, L. Houthuys, X. Huang, B. Hunyadi, A. Installe, V. Jumutc, Z. Karevan, P. Karsmakers, R. Langone, Y. Liu, X. Lou, R. Mall, S. Mehrkanoon, Y. Moreau, M. Novak, F. Ojeda, K. Pelckmans, N. Pochet, D. Popovic, J. Puertas, M. Seslija, L. Shi, M. Signoretto, P. Smyth, Q. Tran Dinh, V. Van Belle, T. Van Gestel, J. Vandewalle, S. Van Huffel, C. Varon, X. Xi, Y. Yang, S. Yu, and others Many others for joint work, discussions, invitations, organizations Support from ERC AdG ADATADRIVEB, KU Leuven, GOAMaNet, OPTEC, IUAP DYSCO, FWO projects, IWT, iminds, BIL, COST Kernel methods for complex networks and big data  Johan Suykens 53
73 Acknowledgements (2) Kernel methods for complex networks and big data  Johan Suykens 54
74 Thank you Kernel methods for complex networks and big data  Johan Suykens 55
Kernel methods for exploratory data analysis and community detection
Kernel methods for exploratory data analysis and community detection Johan Suykens KU Leuven, ESATSCD/SISTA Kasteelpark Arenberg B3 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/scd/
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESATSCD/SISTA Kasteelpark Arenberg 1 B31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More informationRepresentative Subsets For Big Data Learning using
Representative Subsets For Big Data Learning using knn Graphs Raghvendra Mall, Vilen Jumutc, Rocco Langone, Johan A.K. Suykens KU Leuven, ESAT/STADIUS Kasteelpark Arenberg 10, B3001 Leuven, Belgium {raghvendra.mall,vilen.jumutc,rocco.langone,johan.suykens}@esat.kuleuven.be
More informationKernel Spectral Clustering for Big Data Networks
Entropy 2013, 15, 15671586; doi:10.3390/e15051567 Article Kernel Spectral Clustering for Big Data Networks Raghvendra Mall *, Rocco Langone and Johan A.K. Suykens OPEN ACCESS entropy ISSN 10994300 www.mdpi.com/journal/entropy
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationLargeScale Sparsified Manifold Regularization
LargeScale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationData Visualization and Dimensionality Reduction. using Kernel Maps with a Reference Point
Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point Johan A.K. Suykens K.U. Leuven, ESATSCD/SISTA Kasteelpark Arenberg B3 Leuven (Heverlee), Belgium Tel: 3/6/3 8
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graphclustering methods and their applications
More informationNETZCOPE  a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE  a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationOnline SemiSupervised Learning
Online SemiSupervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. WisconsinMadison) Online SemiSupervised
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@neclabs.com 1 Support Vector Machines: history SVMs introduced
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationIntroduction to Machine Learning
Introduction to Machine Learning Johan A.K. Suykens KU Leuven, ESATSCD/SISTA Kasteelpark Arenberg 10 B3001 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be April 2013 1 Scope and context
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationTree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B4000
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationA scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationMulticlass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationIntroduction to Machine Learning
Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationCase Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
More informationIn Defense of OneVsAll Classification
Journal of Machine Learning Research 5 (2004) 101141 Submitted 4/03; Revised 8/03; Published 1/04 In Defense of OneVsAll Classification Ryan Rifkin Honda Research Institute USA 145 Tremont Street Boston,
More informationScaling KernelBased Systems to Large Data Sets
Data Mining and Knowledge Discovery, Volume 5, Number 3, 200 Scaling KernelBased Systems to Large Data Sets Volker Tresp Siemens AG, Corporate Technology OttoHahnRing 6, 8730 München, Germany (volker.tresp@mchp.siemens.de)
More informationSoft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationA fast multiclass SVM learning method for huge databases
www.ijcsi.org 544 A fast multiclass SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and TalebAhmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationFiltered Gaussian Processes for Learning with Large DataSets
Filtered Gaussian Processes for Learning with Large DataSets Jian Qing Shi, Roderick MurraySmith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationFrancesco Sorrentino Department of Mechanical Engineering
Master stability function approaches to analyze stability of the synchronous evolution for hypernetworks and of synchronized clusters for networks with symmetries Francesco Sorrentino Department of Mechanical
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationLarge Scale Spectral Clustering with LandmarkBased Representation
Proceedings of the TwentyFifth AAAI Conference on Artificial Intelligence Large Scale Spectral Clustering with LandmarkBased Representation Xinlei Chen Deng Cai State Key Lab of CAD&CG, College of Computer
More informationEcommerce Transaction Anomaly Classification
Ecommerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of ecommerce
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks YoungRae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationHigh Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationLecture 6: Logistic Regression
Lecture 6: CS 19410, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 RealTime Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 234) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationOnline learning of multiclass Support Vector Machines
IT 12 061 Examensarbete 30 hp November 2012 Online learning of multiclass Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract
More informationFeature Engineering in Machine Learning
Research Fellow Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia August 21, 2015 Outline A Machine Learning Primer Machine Learning and Data Science BiasVariance Phenomenon
More informationEntropy based Graph Clustering: Application to Biological and Social Networks
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley YoungRae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
More informationUSE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.Ing. Morris Riedel et al. Research Group Leader,
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012 EMart No. of items sold per day = 139x2000x20 = ~6 million
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the FordFulkerson
More informationTwo Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification
29 Ninth IEEE International Conference on Data Mining Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification Fei Wang 1,JimengSun 2,TaoLi 1, Nikos Anerousis
More informationComparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationSupervised and unsupervised learning  1
Chapter 3 Supervised and unsupervised learning  1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the socalled high dimensional, low sample size (HDLSS) settings, LDA
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationData analysis in supersaturated designs
Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,
More informationBUSINESS ANALYTICS. Data Preprocessing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.
Tomáš Horváth BUSINESS ANALYTICS Lecture 3 Data Preprocessing Information Systems and Machine Learning Lab University of Hildesheim Germany Overview The aim of this lecture is to describe some data preprocessing
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationReport: Declarative Machine Learning on MapReduce (SystemML)
Report: Declarative Machine Learning on MapReduce (SystemML) Jessica Falk ETHID 11947512 May 28, 2014 1 Introduction SystemML is a system used to execute machine learning (ML) algorithms in HaDoop,
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationLargeScale Similarity and Distance Metric Learning
LargeScale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
More information