Algorithmic Methods for Complex Network Analysis: Graph Clustering

Size: px
Start display at page:

Download "Algorithmic Methods for Complex Network Analysis: Graph Clustering"

Transcription

1 Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL I NFORMATICS KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

2 Scenario of Network Analysis Given a network explore the instance derive its structure identify its properties Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

3 Scenario of Network Analysis Given a network explore the instance derive its structure identify its properties How can we learn about the instance? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

4 An Archetypal Example Zachary s Karate Club, a real, social network years of observation 34 vertices = members 78 edges = social ties club split up after dispute manager vs. trainers archon of toy examples Caused by an unequal flow of sentiments and information across the ties a factional division led to a formal separation of the club. [Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, 77] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

5 A Glimpse of Network Analysis graph clustering / detecting communities Group 4 Group Group Group box = cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

6 Scaling of Real-World Instances Zachary s Karate Club ) (vertices/edges = 34/78) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

7 Scaling of Real-World Instances US college football teams and matches (vertices/edges = 5/66) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

8 Scaling of Real-World Instances variables of a SAT-instance edges = direct dep. (electr. components) (vertices/edges 2K/6K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

9 Scaling of Real-World Instances sci. collaborations: 3-hop neighorhood von D. Wagner (DBLP) (vertices/edges 0k/40k) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

10 Scaling of Real-World Instances physical Internet: autonomous systemes (vertices/edges 20K/60K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

11 Scaling of Real-World Instances... no limit to be expected... instance vertices edges coauthors in DBLP 300K M roads in the USA 24M 60M WWW:.UK-domain 02 20M 500M ( neurons in human brain ) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

12 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

13 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

14 Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Many exist, optimization generally (NP-)hard There is no single, universally best strategy Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

15 Algorithm Engineering design modelling reality is hard analyze Algorithms implement experiment Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

16 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

17 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

18 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

19 Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement need good foundation Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

20 Clustering vs. Partitioning clustering partitioning purpose analysis (pred.) handling of instance... and then? zoom/abstraction computations on parts # of parts open predefined (upper bound) size of parts open upper bound (or even fixed) criteria various (later) weighted cuts constraints often none see above applications various (later) often: distributed finite element methods on 3d-meshes of objects Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49

21 Bicriterial Formulations observations: clusterings often nice if balanced (like partition) 2 intra-density vs. inter-sparsity is bicriterial bicriterial (or multi-) measures for clusterings can help: constrain sparsity within clusters constrain density between clusters explicitly formulate desiderata (more on bicriteria later) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

22 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

23 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

24 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

25 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

26 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

27 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

28 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

29 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

30 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

31 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

32 Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

33 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

34 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

35 Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) intra-cluster density: best addit. cut: intra-cluster density: 3 edges for cutting off 4 nodes (expensive) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

36 Examples: Conductance, Expansion conductance of a cut (C, V \ C): ω(e(c, V \ C)) ϕ(c, V \ C) := { } min ω(v), ω(v) v C v V \C (i.e.: thickness of bottleneck which cuts off C) inter-cluster conductance (C) := max C C ϕ(c, V \ C) (i.e.: worst bottleneck induced by some C C) intra-cluster conductance (C) := min C C min P Q=C ϕ C (P, Q) (i.e.: best bottleneck still left uncut inside some C C) expansion of a cut (C, V \ C): ω(e(c, V \ C)) ψ(c, V \ C) := { } min C, V \ C (i.e.: in ϕ, replace ω(v) by ; intra- and inter-cluster expansion analogously) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

37 Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

38 Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) intra-cluster density: 53 edges of 99 node pairs (many) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

39 Example Counting Measures coverage: cov(c) := # intra-cluster edges # edges (i.e.: fraction of covered edges) # intra-cluster edges+# absent inter-cluster edges performance: perf(c) := 2 n(n ) (i.e.: fraction of correctly classified pairs of nodes) # intra-cluster edges # absent inter-cluster edges # possible inter-cluster edges density: den(c) := 2 # possible intra-cluster edges + 2 (i.e.: fractions of correct intra- and inter-edges) modularity: mod(c) := cov(c) E[cov(C)] (i.e.: how clear is the clustering, compared to random network?) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

40 Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob coverage = # intra-cluster edges # edges 0.9 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

41 Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob # intra-cluster edges coverage = # edges 0.9 only one cluster coverage =.0 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

42 A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

43 A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Modularity mod(c) := cov(c) E(cov(C)) = # intra-cluster edges #edges 4 #edges 2 C C ( deg(v) v C ) 2 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

44 Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

45 Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] scaling behavior (double instance, result differs) [folklore] non-locality of optimal clustering [folklore] resolution limit (no tiny and large clusters at the same time) [Fortunato and Barthelemy 07] large sparse graph high values, balanced clusters [Good et al.: The performance of modularity maximization in practical contexts, 2009] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

46 Modularity, Algorithmic Theory The complexity of modularity optimization: finding C with maximum modularity is NP-hard reduction from 3-PARTITION restriction to C = 2 also hard not FPT wrt. C greedy maximization (later) does not approximate very limited families combinatorially solvable ILP-formulation, feasible for V 200 [Brandes et al.: On modularity clustering, 2008] diverse results on approximability on specific classes of graphs [DasGupta, Devine: On the complexity of newman s community finding approach for biological and social networks, 20] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49

47 How to Cluster? Optimization of quality function: Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

48 How to Cluster? Optimization of quality function: Bottom-up: start with singletons Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

49 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

50 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

51 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

52 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

53 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

54 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

55 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

56 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

57 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

58 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

59 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

60 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

61 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

62 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

63 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

64 How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

65 Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

66 Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity This talk: Systematic collection of sparsity and density measures Classification of measures with respect to their behavior Experimental evaluation of greedy merge vs. greedy moves Qualitative comparison of clusterings obtained by optimizing different measures See also: [Schumm et al.: Density-Constrained Graph Clustering, WADS 20] [Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appear in ACM JEA] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

67 Inter-cluster-sparsity: Cut-based Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

68 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

69 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

70 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

71 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

72 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

73 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

74 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

75 Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Global View: A clustering with k clusters induces a k-way cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

76 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

77 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

78 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

79 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

80 Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

81 Intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

82 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

83 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

84 Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

85 Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

86 Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges minimum/average/global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

87 Problem Statement Density-Constrained Clustering Given a graph G = (V, E), among all clusterings with an intra-cluster density of no less than α, find a clustering C with optimum inter-cluster sparsity. 3 possible intra-cluster density measure 4 possible inter-cluster sparsity measures Family of 42 optimization problems Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

88 Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

89 Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. motivates use of heuristic greedy algorithms Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

90 Greedy Algorithms Greedy Merge (GM) Popular for modularity-based clustering Idea: Merge clusters iteratively Greedy Vertex Moving (GVM) Closely related to algorithms for graph partitioning Very successfull for optimizing modularity [Rotta et al. ] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

91 Generic Greedy Merge Algorithm Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

92 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

93 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

94 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

95 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

96 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

97 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

98 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

99 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

100 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

101 Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

102 Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

103 Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Question Without constraints, is there always a merge that improves the objective function? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

104 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

105 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

106 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

107 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

108 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

109 (Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd unbounded e.g., modularity is bounded nxe gxd mixc mixd mixe aixc aixe mixd mpxd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

110 Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

111 Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

112 Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

113 Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Context insensitivity of an intracluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

114 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

115 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

116 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

117 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

118 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

119 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

120 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

121 Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = E.g., minimum intra-cluster density is context insensitive Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

122 Context Insensitivity: Classification context insensitive minimum intra-cluster density context sensitive average intra-cluster density global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

123 Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

124 Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Locality of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

125 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

126 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

127 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

128 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

129 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

130 Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Existence of such an order Locality of the inter-cluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

131 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

132 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

133 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

134 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

135 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

136 Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Maximum isolated inter-cluster conductance is local Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

137 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

138 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 5 39 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

139 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

140 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 6 42 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

141 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local better worse inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

142 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

143 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 33 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

144 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

145 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

146 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

147 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local worse better inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

148 Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local local worse better mixd mixc mixe aixd aixc aixe nxe not local inter-cluster edges possible inter-cluster edges = 8 36 mpxd apxd mpxc mpxe gxd apxe apxc Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

149 Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

150 Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Connectedness of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

151 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

152 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

153 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected Best option! 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

154 Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected connected nxe Best option! 8 unconnected gxd mixc mixd mixe aixc aixe mixd mpxd mpxc mpxe apxd apxc aixd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

155 Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

156 Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Context insensitivity + Locality Connectedness = O(md log n) running time + & linear space Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

157 Example: Graph of our Department chair Modularity-based algorithm Algorithmic Methods for Complex Network Analysis: Graph Clustering greedy merge (mid + aixc) September 9, /49

158 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

159 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

160 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

161 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

162 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

163 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

164 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

165 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

166 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

167 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

168 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

169 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

170 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

171 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

172 Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

173 Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

174 Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

175 Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

176 Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

177 Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

178 Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

179 Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

180 Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

181 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

182 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

183 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

184 Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

185 Effectiveness: Merge vs. Move Question: Which greedy algorithm is more effective? Setup: Preliminary Experiments: Pairwise measures behave counter-intuitively left out of experimental analysis Experiments on Real-World Networks taken from the benchmark sets of Arenas and Newman Outcome: Different Configurations Intracluster density measure Intercluster sparsity measure Parameter α Summary: In 74 percent of all configurations, greedy vertex moving performs better than greedy merging Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

186 Social Network of Dolphins [Lusseau 04] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

187 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: average intercluster density maximum intercluster density global intercluster density intercluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

188 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: av. intercluster conductance av. intercluster expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

189 Social Network of Dolphins Restriction: global intracluster density > 0.2 Objective: max. intercluster expansion max. intercluster conductance Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

190 Social Network of Dolphins Objective: modularity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

191 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

192 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

193 Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

194 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

195 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

196 Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Parameter α expected intracluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

197 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

198 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

199 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

200 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE reference Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

201 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

202 Planted Partition Graphs: Rough Summary Distance to reference clustering global intracluster density minimum intracluster density ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

203 Planted Partition Graphs: Insights Investigating different configurations yields further insights: Using average intracluster density as constraint leads to very unbalanced clusterings Constraining modularity by maximum intracluster density improves its results... especially if expected number of clusters is high Fine reference clusterings disbalance maximum objectives Average intercluster expansion/density identify many clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

204 Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

205 Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Thank you for your attention! Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, /49

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Algorithms for representing network centrality, groups and density and clustered graph representation

Algorithms for representing network centrality, groups and density and clustered graph representation COSIN IST 2001 33555 COevolution and Self-organization In dynamical Networks Algorithms for representing network centrality, groups and density and clustered graph representation Deliverable Number: D06

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

Efficient Crawling of Community Structures in Online Social Networks

Efficient Crawling of Community Structures in Online Social Networks Efficient Crawling of Community Structures in Online Social Networks Network Architectures and Services PVM 2011-071 Efficient Crawling of Community Structures in Online Social Networks For the degree

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Big graphs for big data: parallel matching and clustering on billion-vertex graphs

Big graphs for big data: parallel matching and clustering on billion-vertex graphs 1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

Big Data Graph Algorithms

Big Data Graph Algorithms Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],

More information

SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

School Timetabling in Theory and Practice

School Timetabling in Theory and Practice School Timetabling in Theory and Practice Irving van Heuven van Staereling VU University, Amsterdam Faculty of Sciences December 24, 2012 Preface At almost every secondary school and university, some

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,

More information

Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance

Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance Michael J. Higgins Jasjeet Sekhon April 12, 2014 EGAP XI A New Blocking Method A new blocking method with nice theoretical

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

More information

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks Rahul Urgaonkar IBM TJ Watson Research Center Yorktown Heights, NY 10598 Email: [email protected] Prithwish Basu and Saikat Guha

More information

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q... Lecture 4 Scheduling 1 Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max structure of a schedule 0 Q 1100 11 00 11 000 111 0 0 1 1 00 11 00 11 00

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013 Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually

More information

B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data. 2 Clustering B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

More information

Mechanisms for Fair Attribution

Mechanisms for Fair Attribution Mechanisms for Fair Attribution Eric Balkanski Yaron Singer Abstract We propose a new framework for optimization under fairness constraints. The problems we consider model procurement where the goal is

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

Graph Security Testing

Graph Security Testing JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 23 No. 1 (2015), pp. 29-45 Graph Security Testing Tomasz Gieniusz 1, Robert Lewoń 1, Michał Małafiejski 1 1 Gdańsk University of Technology, Poland Department of

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Guessing Game: NP-Complete?

Guessing Game: NP-Complete? Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

An Approximation Algorithm for Bounded Degree Deletion

An Approximation Algorithm for Bounded Degree Deletion An Approximation Algorithm for Bounded Degree Deletion Tomáš Ebenlendr Petr Kolman Jiří Sgall Abstract Bounded Degree Deletion is the following generalization of Vertex Cover. Given an undirected graph

More information

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices

More information

Constrained Clustering of Territories in the Context of Car Insurance

Constrained Clustering of Territories in the Context of Car Insurance Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July

More information

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

Graph theoretic techniques in the analysis of uniquely localizable sensor networks

Graph theoretic techniques in the analysis of uniquely localizable sensor networks Graph theoretic techniques in the analysis of uniquely localizable sensor networks Bill Jackson 1 and Tibor Jordán 2 ABSTRACT In the network localization problem the goal is to determine the location of

More information

Classification - Examples

Classification - Examples Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

A Constraint-Based Method for Project Scheduling with Time Windows

A Constraint-Based Method for Project Scheduling with Time Windows A Constraint-Based Method for Project Scheduling with Time Windows Amedeo Cesta 1 and Angelo Oddi 1 and Stephen F. Smith 2 1 ISTC-CNR, National Research Council of Italy Viale Marx 15, I-00137 Rome, Italy,

More information

Dynamic programming formulation

Dynamic programming formulation 1.24 Lecture 14 Dynamic programming: Job scheduling Dynamic programming formulation To formulate a problem as a dynamic program: Sort by a criterion that will allow infeasible combinations to be eli minated

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

Optimizing the Placement of Integration Points in Multi-hop Wireless Networks

Optimizing the Placement of Integration Points in Multi-hop Wireless Networks Optimizing the Placement of Integration Points in Multi-hop Wireless Networks Lili Qiu, Ranveer Chandra, Kamal Jain, and Mohammad Mahdian ABSTRACT Efficient integration of a multi-hop wireless network

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

More information

The positive minimum degree game on sparse graphs

The positive minimum degree game on sparse graphs The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA [email protected] András Pluhár Department of Computer Science University

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

More information

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs Leah Epstein Magnús M. Halldórsson Asaf Levin Hadas Shachnai Abstract Motivated by applications in batch scheduling of jobs in manufacturing

More information

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J. A factor 1 European Journal of Operational Research xxx (00) xxx xxx Discrete Optimization approximation algorithm for two-stage stochastic matching problems Nan Kong, Andrew J. Schaefer * Department of

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Finding and counting given length cycles

Finding and counting given length cycles Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected

More information

Class One: Degree Sequences

Class One: Degree Sequences Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of

More information

Strong and Weak Ties

Strong and Weak Ties Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Online Adwords Allocation

Online Adwords Allocation Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement

More information

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta

More information

Generating Labels from Clicks

Generating Labels from Clicks Generating Labels from Clicks R. Agrawal A. Halverson K. Kenthapadi N. Mishra P. Tsaparas Search Labs, Microsoft Research {rakesha,alanhal,krisken,ninam,panats}@microsoft.com ABSTRACT The ranking function

More information