Algorithmic Methods for Complex Network Analysis: Graph Clustering
|
|
|
- Derrick Chandler
- 10 years ago
- Views:
Transcription
1 Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL I NFORMATICS KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
2 Scenario of Network Analysis Given a network explore the instance derive its structure identify its properties Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
3 Scenario of Network Analysis Given a network explore the instance derive its structure identify its properties How can we learn about the instance? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
4 An Archetypal Example Zachary s Karate Club, a real, social network years of observation 34 vertices = members 78 edges = social ties club split up after dispute manager vs. trainers archon of toy examples Caused by an unequal flow of sentiments and information across the ties a factional division led to a formal separation of the club. [Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, 77] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
5 A Glimpse of Network Analysis graph clustering / detecting communities Group 4 Group Group Group box = cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
6 Scaling of Real-World Instances Zachary s Karate Club ) (vertices/edges = 34/78) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
7 Scaling of Real-World Instances US college football teams and matches (vertices/edges = 5/66) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
8 Scaling of Real-World Instances variables of a SAT-instance edges = direct dep. (electr. components) (vertices/edges 2K/6K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
9 Scaling of Real-World Instances sci. collaborations: 3-hop neighorhood von D. Wagner (DBLP) (vertices/edges 0k/40k) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
10 Scaling of Real-World Instances physical Internet: autonomous systemes (vertices/edges 20K/60K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
11 Scaling of Real-World Instances... no limit to be expected... instance vertices edges coauthors in DBLP 300K M roads in the USA 24M 60M WWW:.UK-domain 02 20M 500M ( neurons in human brain ) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
12 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
13 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
14 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Many exist, optimization generally (NP-)hard There is no single, universally best strategy Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
15 Algorithm Engineering design modelling reality is hard analyze Algorithms implement experiment Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
16 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
17 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
18 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
19 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement need good foundation Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
20 Clustering vs. Partitioning clustering partitioning purpose analysis (pred.) handling of instance... and then? zoom/abstraction computations on parts # of parts open predefined (upper bound) size of parts open upper bound (or even fixed) criteria various (later) weighted cuts constraints often none see above applications various (later) often: distributed finite element methods on 3d-meshes of objects Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49
21 Bicriterial Formulations observations: clusterings often nice if balanced (like partition) 2 intra-density vs. inter-sparsity is bicriterial bicriterial (or multi-) measures for clusterings can help: constrain sparsity within clusters constrain density between clusters explicitly formulate desiderata (more on bicriteria later) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
22 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
23 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
24 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
25 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
26 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
27 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
28 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
29 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
30 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
31 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
32 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49
33 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
34 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
35 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) intra-cluster density: best addit. cut: intra-cluster density: 3 edges for cutting off 4 nodes (expensive) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49
36 Examples: Conductance, Expansion conductance of a cut (C, V \ C): ω(e(c, V \ C)) ϕ(c, V \ C) := { } min ω(v), ω(v) v C v V \C (i.e.: thickness of bottleneck which cuts off C) inter-cluster conductance (C) := max C C ϕ(c, V \ C) (i.e.: worst bottleneck induced by some C C) intra-cluster conductance (C) := min C C min P Q=C ϕ C (P, Q) (i.e.: best bottleneck still left uncut inside some C C) expansion of a cut (C, V \ C): ω(e(c, V \ C)) ψ(c, V \ C) := { } min C, V \ C (i.e.: in ϕ, replace ω(v) by ; intra- and inter-cluster expansion analogously) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
37 Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
38 Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) intra-cluster density: 53 edges of 99 node pairs (many) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
39 Example Counting Measures coverage: cov(c) := # intra-cluster edges # edges (i.e.: fraction of covered edges) # intra-cluster edges+# absent inter-cluster edges performance: perf(c) := 2 n(n ) (i.e.: fraction of correctly classified pairs of nodes) # intra-cluster edges # absent inter-cluster edges # possible inter-cluster edges density: den(c) := 2 # possible intra-cluster edges + 2 (i.e.: fractions of correct intra- and inter-edges) modularity: mod(c) := cov(c) E[cov(C)] (i.e.: how clear is the clustering, compared to random network?) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
40 Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob coverage = # intra-cluster edges # edges 0.9 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
41 Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob # intra-cluster edges coverage = # edges 0.9 only one cluster coverage =.0 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49
42 A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
43 A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Modularity mod(c) := cov(c) E(cov(C)) = # intra-cluster edges #edges 4 #edges 2 C C ( deg(v) v C ) 2 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49
44 Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
45 Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] scaling behavior (double instance, result differs) [folklore] non-locality of optimal clustering [folklore] resolution limit (no tiny and large clusters at the same time) [Fortunato and Barthelemy 07] large sparse graph high values, balanced clusters [Good et al.: The performance of modularity maximization in practical contexts, 2009] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49
46 Modularity, Algorithmic Theory The complexity of modularity optimization: finding C with maximum modularity is NP-hard reduction from 3-PARTITION restriction to C = 2 also hard not FPT wrt. C greedy maximization (later) does not approximate very limited families combinatorially solvable ILP-formulation, feasible for V 200 [Brandes et al.: On modularity clustering, 2008] diverse results on approximability on specific classes of graphs [DasGupta, Devine: On the complexity of newman s community finding approach for biological and social networks, 20] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49
47 How to Cluster? Optimization of quality function: Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
48 How to Cluster? Optimization of quality function: Bottom-up: start with singletons Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
49 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
50 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
51 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
52 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
53 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
54 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
55 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
56 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
57 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
58 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
59 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
60 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
61 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
62 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
63 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
64 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49
65 Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
66 Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity This talk: Systematic collection of sparsity and density measures Classification of measures with respect to their behavior Experimental evaluation of greedy merge vs. greedy moves Qualitative comparison of clusterings obtained by optimizing different measures See also: [Schumm et al.: Density-Constrained Graph Clustering, WADS 20] [Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appear in ACM JEA] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
67 Inter-cluster-sparsity: Cut-based Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
68 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
69 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
70 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
71 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
72 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
73 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
74 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
75 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Global View: A clustering with k clusters induces a k-way cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49
76 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
77 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
78 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
79 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
80 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
81 Intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
82 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
83 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
84 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
85 Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
86 Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges minimum/average/global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
87 Problem Statement Density-Constrained Clustering Given a graph G = (V, E), among all clusterings with an intra-cluster density of no less than α, find a clustering C with optimum inter-cluster sparsity. 3 possible intra-cluster density measure 4 possible inter-cluster sparsity measures Family of 42 optimization problems Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
88 Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
89 Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. motivates use of heuristic greedy algorithms Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
90 Greedy Algorithms Greedy Merge (GM) Popular for modularity-based clustering Idea: Merge clusters iteratively Greedy Vertex Moving (GVM) Closely related to algorithms for graph partitioning Very successfull for optimizing modularity [Rotta et al. ] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
91 Generic Greedy Merge Algorithm Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
92 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
93 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
94 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
95 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
96 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
97 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
98 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
99 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
100 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
101 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
102 Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
103 Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Question Without constraints, is there always a merge that improves the objective function? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
104 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
105 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
106 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
107 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
108 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
109 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd unbounded e.g., modularity is bounded nxe gxd mixc mixd mixe aixc aixe mixd mpxd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
110 Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
111 Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
112 Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
113 Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Context insensitivity of an intracluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
114 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
115 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
116 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
117 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
118 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
119 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
120 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
121 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = E.g., minimum intra-cluster density is context insensitive Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49
122 Context Insensitivity: Classification context insensitive minimum intra-cluster density context sensitive average intra-cluster density global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
123 Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
124 Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Locality of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
125 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
126 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
127 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
128 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
129 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
130 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Existence of such an order Locality of the inter-cluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
131 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
132 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
133 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
134 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
135 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
136 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Maximum isolated inter-cluster conductance is local Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
137 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
138 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 5 39 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
139 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
140 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 6 42 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
141 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local better worse inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
142 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
143 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 33 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
144 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
145 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
146 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
147 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local worse better inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
148 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local local worse better mixd mixc mixe aixd aixc aixe nxe not local inter-cluster edges possible inter-cluster edges = 8 36 mpxd apxd mpxc mpxe gxd apxe apxc Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
149 Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
150 Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Connectedness of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
151 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
152 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
153 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Best option! 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
154 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected connected nxe Best option! 8 unconnected gxd mixc mixd mixe aixc aixe mixd mpxd mpxc mpxe apxd apxc aixd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
155 Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
156 Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Context insensitivity + Locality Connectedness = O(md log n) running time + & linear space Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
157 Example: Graph of our Department chair Modularity-based algorithm Algorithmic Methods for Complex Network Analysis: Graph Clustering greedy merge (mid + aixc) September 9, /49
158 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
159 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
160 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
161 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
162 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
163 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
164 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
165 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
166 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
167 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
168 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
169 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
170 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
171 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
172 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49
173 Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
174 Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
175 Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
176 Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
177 Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
178 Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
179 Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
180 Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
181 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
182 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
183 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
184 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
185 Effectiveness: Merge vs. Move Question: Which greedy algorithm is more effective? Setup: Preliminary Experiments: Pairwise measures behave counter-intuitively left out of experimental analysis Experiments on Real-World Networks taken from the benchmark sets of Arenas and Newman Outcome: Different Configurations Intracluster density measure Intercluster sparsity measure Parameter α Summary: In 74 percent of all configurations, greedy vertex moving performs better than greedy merging Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
186 Social Network of Dolphins [Lusseau 04] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
187 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: average intercluster density maximum intercluster density global intercluster density intercluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
188 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: av. intercluster conductance av. intercluster expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
189 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objective: max. intercluster expansion max. intercluster conductance Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
190 Social Network of Dolphins Objective: modularity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
191 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
192 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
193 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
194 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
195 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
196 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Parameter α expected intracluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
197 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
198 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
199 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
200 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE reference Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
201 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
202 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
203 Planted Partition Graphs: Insights Investigating different configurations yields further insights: Using average intracluster density as constraint leads to very unbalanced clusterings Constraining modularity by maximum intracluster density improves its results... especially if expected number of clusters is high Fine reference clusterings disbalance maximum objectives Average intercluster expansion/density identify many clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
204 Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
205 Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Thank you for your attention! Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
Algorithms for representing network centrality, groups and density and clustered graph representation
COSIN IST 2001 33555 COevolution and Self-organization In dynamical Networks Algorithms for representing network centrality, groups and density and clustered graph representation Deliverable Number: D06
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
Efficient Crawling of Community Structures in Online Social Networks
Efficient Crawling of Community Structures in Online Social Networks Network Architectures and Services PVM 2011-071 Efficient Crawling of Community Structures in Online Social Networks For the degree
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
NETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Big graphs for big data: parallel matching and clustering on billion-vertex graphs
1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
Small Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
Big Data Graph Algorithms
Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
Network Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
School Timetabling in Theory and Practice
School Timetabling in Theory and Practice Irving van Heuven van Staereling VU University, Amsterdam Faculty of Sciences December 24, 2012 Preface At almost every secondary school and university, some
Scheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance
Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance Michael J. Higgins Jasjeet Sekhon April 12, 2014 EGAP XI A New Blocking Method A new blocking method with nice theoretical
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
On the effect of forwarding table size on SDN network utilization
IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA
A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,
Fairness in Routing and Load Balancing
Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria
Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks
Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks Rahul Urgaonkar IBM TJ Watson Research Center Yorktown Heights, NY 10598 Email: [email protected] Prithwish Basu and Saikat Guha
Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...
Lecture 4 Scheduling 1 Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max structure of a schedule 0 Q 1100 11 00 11 000 111 0 0 1 1 00 11 00 11 00
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013
Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually
B490 Mining the Big Data. 2 Clustering
B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern
Mechanisms for Fair Attribution
Mechanisms for Fair Attribution Eric Balkanski Yaron Singer Abstract We propose a new framework for optimization under fairness constraints. The problems we consider model procurement where the goal is
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
Graph Security Testing
JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 23 No. 1 (2015), pp. 29-45 Graph Security Testing Tomasz Gieniusz 1, Robert Lewoń 1, Michał Małafiejski 1 1 Gdańsk University of Technology, Poland Department of
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Guessing Game: NP-Complete?
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Exponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
An Approximation Algorithm for Bounded Degree Deletion
An Approximation Algorithm for Bounded Degree Deletion Tomáš Ebenlendr Petr Kolman Jiří Sgall Abstract Bounded Degree Deletion is the following generalization of Vertex Cover. Given an undirected graph
SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov
Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices
Constrained Clustering of Territories in the Context of Car Insurance
Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July
Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations
Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the
Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
Three Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
Definition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
Graph theoretic techniques in the analysis of uniquely localizable sensor networks
Graph theoretic techniques in the analysis of uniquely localizable sensor networks Bill Jackson 1 and Tibor Jordán 2 ABSTRACT In the network localization problem the goal is to determine the location of
Classification - Examples
Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
8.1 Min Degree Spanning Tree
CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
A Constraint-Based Method for Project Scheduling with Time Windows
A Constraint-Based Method for Project Scheduling with Time Windows Amedeo Cesta 1 and Angelo Oddi 1 and Stephen F. Smith 2 1 ISTC-CNR, National Research Council of Italy Viale Marx 15, I-00137 Rome, Italy,
Dynamic programming formulation
1.24 Lecture 14 Dynamic programming: Job scheduling Dynamic programming formulation To formulate a problem as a dynamic program: Sort by a criterion that will allow infeasible combinations to be eli minated
CIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
Optimizing the Placement of Integration Points in Multi-hop Wireless Networks
Optimizing the Placement of Integration Points in Multi-hop Wireless Networks Lili Qiu, Ranveer Chandra, Kamal Jain, and Mohammad Mahdian ABSTRACT Efficient integration of a multi-hop wireless network
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
The positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA [email protected] András Pluhár Department of Computer Science University
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams
Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery
Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs
Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs Leah Epstein Magnús M. Halldórsson Asaf Levin Hadas Shachnai Abstract Motivated by applications in batch scheduling of jobs in manufacturing
ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.
A factor 1 European Journal of Operational Research xxx (00) xxx xxx Discrete Optimization approximation algorithm for two-stage stochastic matching problems Nan Kong, Andrew J. Schaefer * Department of
Random graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
Finding and counting given length cycles
Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected
Class One: Degree Sequences
Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of
Strong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
Online Adwords Allocation
Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement
ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML
UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta
Generating Labels from Clicks
Generating Labels from Clicks R. Agrawal A. Halverson K. Kenthapadi N. Mishra P. Tsaparas Search Labs, Microsoft Research {rakesha,alanhal,krisken,ninam,panats}@microsoft.com ABSTRACT The ranking function
