Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL I NFORMATICS KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu
Scenario of Network Analysis Given a network... 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 explore the instance derive its structure identify its properties Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Scenario of Network Analysis Given a network... 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 explore the instance derive its structure identify its properties How can we learn about the instance? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
An Archetypal Example Zachary s Karate Club, a real, social network 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 2 years of observation 34 vertices = members 78 edges = social ties club split up after dispute manager vs. trainers archon of toy examples Caused by an unequal flow of sentiments and information across the ties a factional division led to a formal separation of the club. [Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, 77] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
A Glimpse of Network Analysis graph clustering / detecting communities Group 4 Group 3 23 0 3 20 27 5 3 Group 2 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 Group 24 28 29 32 26 25 box = cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Scaling of Real-World Instances Zachary s Karate Club ) (vertices/edges = 34/78) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Scaling of Real-World Instances US college football teams and matches (vertices/edges = 5/66) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Scaling of Real-World Instances variables of a SAT-instance edges = direct dep. (electr. components) (vertices/edges 2K/6K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Scaling of Real-World Instances sci. collaborations: 3-hop neighorhood von D. Wagner (DBLP) (vertices/edges 0k/40k) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Scaling of Real-World Instances physical Internet: autonomous systemes (vertices/edges 20K/60K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Scaling of Real-World Instances... no limit to be expected... instance vertices edges coauthors in DBLP 300K M roads in the USA 24M 60M WWW:.UK-domain 02 20M 500M ( neurons in human brain 0 0 7 ) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Many exist, optimization generally (NP-)hard There is no single, universally best strategy Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
Algorithm Engineering design modelling reality is hard analyze Algorithms implement experiment Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement need good foundation Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Clustering vs. Partitioning clustering partitioning purpose analysis (pred.) handling of instance... and then? zoom/abstraction computations on parts # of parts open predefined (upper bound) size of parts open upper bound (or even fixed) criteria various (later) weighted cuts constraints often none see above applications various (later) often: distributed finite element methods on 3d-meshes of objects Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49
Bicriterial Formulations observations: clusterings often nice if balanced (like partition) 2 intra-density vs. inter-sparsity is bicriterial bicriterial (or multi-) measures for clusterings can help: constrain sparsity within clusters constrain density between clusters explicitly formulate desiderata (more on bicriteria later) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) intra-cluster density: best addit. cut: intra-cluster density: 3 edges for cutting off 4 nodes (expensive) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
Examples: Conductance, Expansion conductance of a cut (C, V \ C): ω(e(c, V \ C)) ϕ(c, V \ C) := { } min ω(v), ω(v) v C v V \C (i.e.: thickness of bottleneck which cuts off C) inter-cluster conductance (C) := max C C ϕ(c, V \ C) (i.e.: worst bottleneck induced by some C C) intra-cluster conductance (C) := min C C min P Q=C ϕ C (P, Q) (i.e.: best bottleneck still left uncut inside some C C) expansion of a cut (C, V \ C): ω(e(c, V \ C)) ψ(c, V \ C) := { } min C, V \ C (i.e.: in ϕ, replace ω(v) by ; intra- and inter-cluster expansion analogously) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) intra-cluster density: 53 edges of 99 node pairs (many) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Example Counting Measures coverage: cov(c) := # intra-cluster edges # edges (i.e.: fraction of covered edges) # intra-cluster edges+# absent inter-cluster edges performance: perf(c) := 2 n(n ) (i.e.: fraction of correctly classified pairs of nodes) # intra-cluster edges # absent inter-cluster edges # possible inter-cluster edges density: den(c) := 2 # possible intra-cluster edges + 2 (i.e.: fractions of correct intra- and inter-edges) modularity: mod(c) := cov(c) E[cov(C)] (i.e.: how clear is the clustering, compared to random network?) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob coverage = # intra-cluster edges # edges 0.9 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob # intra-cluster edges coverage = # edges 0.9 only one cluster coverage =.0 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Modularity mod(c) := cov(c) E(cov(C)) = # intra-cluster edges #edges 4 #edges 2 C C ( deg(v) v C ) 2 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] scaling behavior (double instance, result differs) [folklore] non-locality of optimal clustering [folklore] resolution limit (no tiny and large clusters at the same time) [Fortunato and Barthelemy 07] large sparse graph high values, balanced clusters [Good et al.: The performance of modularity maximization in practical contexts, 2009] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
Modularity, Algorithmic Theory The complexity of modularity optimization: finding C with maximum modularity is NP-hard reduction from 3-PARTITION restriction to C = 2 also hard not FPT wrt. C greedy maximization (later) does not approximate very limited families combinatorially solvable ILP-formulation, feasible for V 200 [Brandes et al.: On modularity clustering, 2008] diverse results on approximability on specific classes of graphs [DasGupta, Devine: On the complexity of newman s community finding approach for biological and social networks, 20] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49
How to Cluster? Optimization of quality function: Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 20/49
Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity This talk: Systematic collection of sparsity and density measures Classification of measures with respect to their behavior Experimental evaluation of greedy merge vs. greedy moves Qualitative comparison of clusterings obtained by optimizing different measures See also: [Schumm et al.: Density-Constrained Graph Clustering, WADS 20] [Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appear in ACM JEA] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 20/49
Inter-cluster-sparsity: Cut-based Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Global View: A clustering with k clusters induces a k-way cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49
Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49
Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49
Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49
Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49
Intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges minimum/average/global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49
Problem Statement Density-Constrained Clustering Given a graph G = (V, E), among all clusterings with an intra-cluster density of no less than α, find a clustering C with optimum inter-cluster sparsity. 3 possible intra-cluster density measure 4 possible inter-cluster sparsity measures Family of 42 optimization problems Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 24/49
Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 25/49
Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. motivates use of heuristic greedy algorithms Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 25/49
Greedy Algorithms Greedy Merge (GM) Popular for modularity-based clustering Idea: Merge clusters iteratively Greedy Vertex Moving (GVM) Closely related to algorithms for graph partitioning Very successfull for optimizing modularity [Rotta et al. ] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 26/49
Generic Greedy Merge Algorithm Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 9 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 8 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 3 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 3 7 0 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 3 7 0 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49
Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 28/49
Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Question Without constraints, is there always a merge that improves the objective function? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 28/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 8 8 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd unbounded e.g., modularity is bounded nxe gxd mixc mixd mixe aixc aixe mixd mpxd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49
Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49
Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49
Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49
Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Context insensitivity of an intracluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 2 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 3 4 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 3 4 0.7 E.g., minimum intra-cluster density is context insensitive Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
Context Insensitivity: Classification context insensitive minimum intra-cluster density context sensitive average intra-cluster density global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 32/49
Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 33/49
Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Locality of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 33/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Existence of such an order Locality of the inter-cluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Maximum isolated inter-cluster conductance is local Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 5 39 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 6 42 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local better worse inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 33 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local worse better inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local local worse better mixd mixc mixe aixd aixc aixe nxe not local inter-cluster edges possible inter-cluster edges = 8 36 mpxd apxd mpxc mpxe gxd apxe apxc Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49
Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 37/49
Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Connectedness of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 37/49
Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 4 4 4 4 4 4 4 4 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49
Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 8 8 8 8 8 8 8 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49
Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 8 8 8 8 8 8 8 Best option! 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49
Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected connected nxe 8 8 8 8 8 8 8 Best option! 8 unconnected gxd mixc mixd mixe aixc aixe mixd mpxd mpxc mpxe apxd apxc aixd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49
Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 39/49
Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Context insensitivity + Locality Connectedness = O(md log n) running time + & linear space Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 39/49
Example: Email Graph of our Department chair Modularity-based algorithm Algorithmic Methods for Complex Network Analysis: Graph Clustering greedy merge (mid + aixc) September 9, 204 40/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 9 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 9 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 8 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 6 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 6 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 5 6 2 3 5 4 6 7 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 5 6 2 5 4 6 7 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49
Effectiveness: Merge vs. Move Question: Which greedy algorithm is more effective? Setup: Preliminary Experiments: Pairwise measures behave counter-intuitively left out of experimental analysis Experiments on Real-World Networks taken from the benchmark sets of Arenas and Newman Outcome: Different Configurations Intracluster density measure Intercluster sparsity measure Parameter α Summary: In 74 percent of all configurations, greedy vertex moving performs better than greedy merging Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 43/49
Social Network of Dolphins [Lusseau 04] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: average intercluster density maximum intercluster density global intercluster density intercluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: av. intercluster conductance av. intercluster expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Restriction: global intracluster density > 0.2 Objective: max. intercluster expansion max. intercluster conductance Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Objective: modularity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49
Planted Partition Graphs: Setup Planted Partition Graph: p in p out Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49
Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49
Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Parameter α expected intracluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE reference Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49
Planted Partition Graphs: Insights Investigating different configurations yields further insights: Using average intracluster density as constraint leads to very unbalanced clusterings Constraining modularity by maximum intracluster density improves its results... especially if expected number of clusters is high Fine reference clusterings disbalance maximum objectives Average intercluster expansion/density identify many clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 47/49
Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 48/49
Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Thank you for your attention! Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 48/49