Algorithmic Methods for Complex Network Analysis: Graph Clustering

Similar documents
Clustering UE 141 Spring 2013

A scalable multilevel algorithm for graph clustering and community structure detection

Algorithms for representing network centrality, groups and density and clustered graph representation

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Efficient Crawling of Community Structures in Online Social Networks

Complex Networks Analysis: Clustering Methods

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

How To Cluster Of Complex Systems

Part 2: Community Detection

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Protein Protein Interaction Networks

Big graphs for big data: parallel matching and clustering on billion-vertex graphs

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Small Maximal Independent Sets and Faster Exact Graph Coloring

Big Data Graph Algorithms

An Empirical Study of Two MIS Algorithms

SCAN: A Structural Clustering Algorithm for Networks

5.1 Bipartite Matching

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Network Algorithms for Homeland Security

School Timetabling in Theory and Practice

Scheduling Shop Scheduling. Tim Nieberg

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance

Mining Social-Network Graphs

On the effect of forwarding table size on SDN network utilization

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Offline sorting buffers on Line

Mining Social Network Graphs

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

Fairness in Routing and Load Balancing

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Graph Mining and Social Network Analysis

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

B490 Mining the Big Data. 2 Clustering

Mechanisms for Fair Attribution

Applied Algorithm Design Lecture 5

Graph Security Testing

Approximation Algorithms

Guessing Game: NP-Complete?

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Exponential time algorithms for graph coloring

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Social Media Mining. Graph Essentials

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Compact Representations and Approximations for Compuation in Games

An Approximation Algorithm for Bounded Degree Deletion

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

Constrained Clustering of Territories in the Context of Car Insurance

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Definition Given a graph G on n vertices, we define the following quantities:

Graph theoretic techniques in the analysis of uniquely localizable sensor networks

Classification - Examples

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

8.1 Min Degree Spanning Tree

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

A Constraint-Based Method for Project Scheduling with Time Windows

Dynamic programming formulation

CIS 700: algorithms for Big Data

Optimizing the Placement of Integration Points in Multi-hop Wireless Networks

Practical Graph Mining with R. 5. Link Analysis

Information Retrieval and Web Search Engines

The positive minimum degree game on sparse graphs

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.

Random graphs with a given degree sequence

Finding and counting given length cycles

Class One: Degree Sequences

Strong and Weak Ties

Distributed Computing over Communication Networks: Maximal Independent Set

Online Adwords Allocation

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

Generating Labels from Clicks

Transcription:

Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL I NFORMATICS KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu

Scenario of Network Analysis Given a network... 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 explore the instance derive its structure identify its properties Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Scenario of Network Analysis Given a network... 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 explore the instance derive its structure identify its properties How can we learn about the instance? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

An Archetypal Example Zachary s Karate Club, a real, social network 23 0 3 20 27 5 3 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 24 28 29 32 26 25 2 years of observation 34 vertices = members 78 edges = social ties club split up after dispute manager vs. trainers archon of toy examples Caused by an unequal flow of sentiments and information across the ties a factional division led to a formal separation of the club. [Wayne Zachary: An Information Flow Model for Conflict and Fission in Small Groups, 77] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

A Glimpse of Network Analysis graph clustering / detecting communities Group 4 Group 3 23 0 3 20 27 5 3 Group 2 6 4 4 5 6 2 30 33 34 9 3 2 7 7 9 8 22 8 2 Group 24 28 29 32 26 25 box = cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Scaling of Real-World Instances Zachary s Karate Club ) (vertices/edges = 34/78) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Scaling of Real-World Instances US college football teams and matches (vertices/edges = 5/66) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Scaling of Real-World Instances variables of a SAT-instance edges = direct dep. (electr. components) (vertices/edges 2K/6K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Scaling of Real-World Instances sci. collaborations: 3-hop neighorhood von D. Wagner (DBLP) (vertices/edges 0k/40k) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Scaling of Real-World Instances physical Internet: autonomous systemes (vertices/edges 20K/60K) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Scaling of Real-World Instances... no limit to be expected... instance vertices edges coauthors in DBLP 300K M roads in the USA 24M 60M WWW:.UK-domain 02 20M 500M ( neurons in human brain 0 0 7 ) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

Clustering: Intuition to Formalization Task: partition graph into natural groups Paradigm: intra-cluster density vs. inter-cluster sparsity Different approaches exist to formalize this paradigm, usually: Paradigm of Graph Clustering Intra-cluster density vs. inter-cluster sparsity Mathematical Formalization quality measures for clusterings Many exist, optimization generally (NP-)hard There is no single, universally best strategy Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

Algorithm Engineering design modelling reality is hard analyze Algorithms implement experiment Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Algorithm Engineering design modelling reality is hard analyze Algorithms experiment finding optima is hard satisfying needs of application is hard still, we do need to cluster implement need good foundation Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Clustering vs. Partitioning clustering partitioning purpose analysis (pred.) handling of instance... and then? zoom/abstraction computations on parts # of parts open predefined (upper bound) size of parts open upper bound (or even fixed) criteria various (later) weighted cuts constraints often none see above applications various (later) often: distributed finite element methods on 3d-meshes of objects Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49

Bicriterial Formulations observations: clusterings often nice if balanced (like partition) 2 intra-density vs. inter-sparsity is bicriterial bicriterial (or multi-) measures for clusterings can help: constrain sparsity within clusters constrain density between clusters explicitly formulate desiderata (more on bicriteria later) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Postulations to a Measure Given a graph G and a clustering C, a quality measure should behave as follows: more intra-edges higher quality less inter-edges higher quality cliques must never be separated clusters must be connected random clusterings should have bad quality disjoint cliques should approach maximum quality locality of the measure (being better/worse in one part does not depend on what is done in other part of graph) double the instance, what should happen... same result comparable results across instances fulfill the desiderata of the application... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 0/49

Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

Formalization via Bottleneck Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Didi Ralph Tess Diane Ron Mandy Cain Helen Kate Sue Bob Yoan Quality of the clustering, upper cluster: inter-cluster sparsity: 2 edges for cutting off 7 nodes (cheap) intra-cluster density: best addit. cut: intra-cluster density: 3 edges for cutting off 4 nodes (expensive) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 /49

Examples: Conductance, Expansion conductance of a cut (C, V \ C): ω(e(c, V \ C)) ϕ(c, V \ C) := { } min ω(v), ω(v) v C v V \C (i.e.: thickness of bottleneck which cuts off C) inter-cluster conductance (C) := max C C ϕ(c, V \ C) (i.e.: worst bottleneck induced by some C C) intra-cluster conductance (C) := min C C min P Q=C ϕ C (P, Q) (i.e.: best bottleneck still left uncut inside some C C) expansion of a cut (C, V \ C): ω(e(c, V \ C)) ψ(c, V \ C) := { } min C, V \ C (i.e.: in ϕ, replace ω(v) by ; intra- and inter-cluster expansion analogously) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Formalization: Counting Edges Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob Measuring clustering quality by counting edges: inter-cluster sparsity: 6 edges of ca. 800 node pairs (few) intra-cluster density: 53 edges of 99 node pairs (many) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Example Counting Measures coverage: cov(c) := # intra-cluster edges # edges (i.e.: fraction of covered edges) # intra-cluster edges+# absent inter-cluster edges performance: perf(c) := 2 n(n ) (i.e.: fraction of correctly classified pairs of nodes) # intra-cluster edges # absent inter-cluster edges # possible inter-cluster edges density: den(c) := 2 # possible intra-cluster edges + 2 (i.e.: fractions of correct intra- and inter-edges) modularity: mod(c) := cov(c) E[cov(C)] (i.e.: how clear is the clustering, compared to random network?) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob coverage = # intra-cluster edges # edges 0.9 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

Motivation for Modularity Marc Toby Richard Violaine Clair Lee Chris Kadka Frank Elaine Phil Ken Raul Robyn Els Susan Alice Doro Dave Holly Ralph Didi Tess Ron Mandy Diane Cain Helen Kate Sue Yoan Bob # intra-cluster edges coverage = # edges 0.9 only one cluster coverage =.0 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 5/49

A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

A Promising Remedy [Girvan and Newman: Finding and evaluating community structure in networks, 04]:... if we subtract from [coverage] the expected value [... ], we do get a useful measure. Modularity mod(c) := cov(c) E(cov(C)) = # intra-cluster edges #edges 4 #edges 2 C C ( deg(v) v C ) 2 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 6/49

Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Modularity in Practice easy to use & implement reasonable behavior on many practical instances heavily used in various fields ecosystem exploration collaboration analyses biochemistry structure of the internet (AS-graph, www, routers) close to human intuition of quality [Görke et al.: Comp. aspects of lucidity-driven clustering, 200] scaling behavior (double instance, result differs) [folklore] non-locality of optimal clustering [folklore] resolution limit (no tiny and large clusters at the same time) [Fortunato and Barthelemy 07] large sparse graph high values, balanced clusters [Good et al.: The performance of modularity maximization in practical contexts, 2009] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 7/49

Modularity, Algorithmic Theory The complexity of modularity optimization: finding C with maximum modularity is NP-hard reduction from 3-PARTITION restriction to C = 2 also hard not FPT wrt. C greedy maximization (later) does not approximate very limited families combinatorially solvable ILP-formulation, feasible for V 200 [Brandes et al.: On modularity clustering, 2008] diverse results on approximability on specific classes of graphs [DasGupta, Devine: On the complexity of newman s community finding approach for biological and social networks, 20] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 8/49

How to Cluster? Optimization of quality function: Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

How to Cluster? Optimization of quality function: Bottom-up: start with singletons merge clusters Top-down: start with the one-cluster split clusters Local Opt.: start with random clustering migrate nodes Variants of recursive min-cutting Percolation of network by removal of highly central edges Spectral methods using eigenanalysis of adjacency Laplacian Direct identification of dense substructures Random walks Geometric approaches... Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 9/49

Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 20/49

Density-Constrained Clustering: Overview New Optimization Problem: Find clusterings with guaranteed intra-cluster density and good inter-cluster sparsity This talk: Systematic collection of sparsity and density measures Classification of measures with respect to their behavior Experimental evaluation of greedy merge vs. greedy moves Qualitative comparison of clusterings obtained by optimizing different measures See also: [Schumm et al.: Density-Constrained Graph Clustering, WADS 20] [Kappes et al.: Experiments on Density-Constrained Graph Clustering, to appear in ACM JEA] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 20/49

Inter-cluster-sparsity: Cut-based Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster-sparsity: Cut-based Isolated View: Each cluster induces a cut Pairwise View: Each pair of clusters induces a cut in their subgraph Global View: A clustering with k clusters induces a k-way cut Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 2/49

Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49

Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49

Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49

Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49

Inter-cluster Sparsity: Degrees of Freedom Set of cuts isolated (one for each cluster) pairwise (one for each pair of clusters) global (k-way cut) Measures number of cut-edges density conductance expansion Combinations average sparsity minimum sparsity 4 (reasonable) inter-cluster sparsity measures Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 22/49

Intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Intra-cluster density Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Intra-cluster density 6 0 Definitions analoguous to inter-cluster sparsity possible Finding cut with optimal density/conductance/expansion is NP-hard Practical approach: evaluate intra-cluster edges possible intra-cluster edges minimum/average/global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 23/49

Problem Statement Density-Constrained Clustering Given a graph G = (V, E), among all clusterings with an intra-cluster density of no less than α, find a clustering C with optimum inter-cluster sparsity. 3 possible intra-cluster density measure 4 possible inter-cluster sparsity measures Family of 42 optimization problems Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 24/49

Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 25/49

Complexity (Example) Reduction from Exact Cover by 3-Sets S v x K n.. V X S m K n v xn Theorem Density-Constrained Clustering combining any intra-cluster density measure with the number of inter-cluster edges is NP-hard. motivates use of heuristic greedy algorithms Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 25/49

Greedy Algorithms Greedy Merge (GM) Popular for modularity-based clustering Idea: Merge clusters iteratively Greedy Vertex Moving (GVM) Closely related to algorithms for graph partitioning Very successfull for optimizing modularity [Rotta et al. ] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 26/49

Generic Greedy Merge Algorithm Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 9 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 8 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 3 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 3 7 0 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 3 7 0 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 5 6 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Generic Greedy Merge Algorithm Example: Minimize number of inter-cluster edges such that the density of each cluster is at least 3 4 Idea: Merge clusters greedily Objective: Increase inter-cluster sparsity Constraint: Intra-cluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 27/49

Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 28/49

Influence of Measures on Algorithm: Coarseness Rough Intuition intra-cluster density inter-cluster sparsity Question Without constraints, is there always a merge that improves the objective function? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 28/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 8 8 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded 2 8 e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd e.g., modularity is bounded Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

(Un-)Boundedness Definition An objective function measure f is unbounded if for any clustering C with C > there exists a merge that does not deteriorate f. Max. pw. inter-cluster conductance is bounded bounded 2 8 mpxc mpxe apxd apxc aixd unbounded e.g., modularity is bounded nxe gxd mixc mixd mixe aixc aixe mixd mpxd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 29/49

Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49

Influence of Measures on Algorithm Feasible merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49

Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49

Influence of Measures on Algorithm? Update Feasible merges Question Does feasibility of a merge only depend on involved clusters? Context insensitivity of an intracluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 30/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 3 < 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 2 2 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 3 4 0.7 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity Definition A constraint is context insensitive, if the feasibility of a merge does not depend on the remainder of the clustering. E.g., global intra-cluster density is context sensitive Constraint: intra-cluster edges possible intra-cluster edges = 3 4 0.7 E.g., minimum intra-cluster density is context insensitive Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 3/49

Context Insensitivity: Classification context insensitive minimum intra-cluster density context sensitive average intra-cluster density global intra-cluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 32/49

Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 33/49

Influence of Measures on Algorithm Heap Feasible merges? Question Optimum Given context insensitivity, can the set of feasible merges be efficiently maintained in a heap? Locality of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 33/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Locality: Intuition Example: Maximum isolated inter-cluster conductance First approach: Use gain in inter-cluster sparsity as key bad merges A, B -0.3 C, D 0 E, F 0 C, F 0 G, H 0 G, I 0.3 A, B -0.3 G I, T -0.3 C, F -0.2 C, D 0 E, F 0.2 merge G and I good merges Clever tie-breaking possible? Needed: Suitable order that does not change if unrelated clusters merge Existence of such an order Locality of the inter-cluster measure Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 34/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Example: Max. isolated inter-cluster conductance Current sequence of conductance of all clusters (sorted) A 0.5 B 0.4 C 0.3 D 0.3 E 0. Sequence if A and B are merged A B 0.45 C 0.3 D 0.3 E 0. Sequence if A and D are merged compare lexicographically: Merging A and B is better! A D 0.45 B 0.4 C 0.3 E 0. Ordering merges lexicographically is stable Two merges can be compared in constant time by comparing keys consisting of three numbers Maximum isolated inter-cluster conductance is local Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 35/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 5 39 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 6 42 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local better worse inter-cluster edges possible inter-cluster edges = 7 43 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 7 33 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local inter-cluster edges possible inter-cluster edges = 9 37 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local worse better inter-cluster edges possible inter-cluster edges = 8 36 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Locality: Results Does such an order exist for all objective functions? global inter-cluster density is not local local worse better mixd mixc mixe aixd aixc aixe nxe not local inter-cluster edges possible inter-cluster edges = 8 36 mpxd apxd mpxc mpxe gxd apxe apxc Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 36/49

Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 37/49

Influence of Measures on Algorithm Feasible merges important? connected merges sufficient? Question Do we have to consider pairs of unconnected clusters? Connectedness of an objective function Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 37/49

Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 4 4 4 4 4 4 4 4 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49

Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 8 8 8 8 8 8 8 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49

Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected 8 8 8 8 8 8 8 Best option! 8 Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49

Disconnectedness Definition An objective function f is connected if merging unconnected clusters is never the best option with respect to f. max. pw. inter-cluster conductance is not connected connected nxe 8 8 8 8 8 8 8 Best option! 8 unconnected gxd mixc mixd mixe aixc aixe mixd mpxd mpxc mpxe apxd apxc aixd Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 38/49

Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 39/49

Influence of Measures on Efficiency (Given the necessary data can efficiently be maintained:) Context insensitivity + Locality = O(n2 log n) running time Context insensitivity + Locality Connectedness = O(md log n) running time + & linear space Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 39/49

Example: Email Graph of our Department chair Modularity-based algorithm Algorithmic Methods for Complex Network Analysis: Graph Clustering greedy merge (mid + aixc) September 9, 204 40/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 9 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 9 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 8 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 7 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 6 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 6 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 2 5 5 7 4 6 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 5 6 2 3 5 4 6 7 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Local Moving Example: Minimize number of intercluster edges such that the density of each cluster is at least 3 4 3 5 6 2 5 4 6 7 Idea: Move vertices greedily Objective: Increase intercluster sparsity Constraint: Intracluster density must not drop below given threshold Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 4/49

Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Greedy Vertex Moving contract project contract project Idea: Use Local Moving on multiple levels Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 42/49

Effectiveness: Merge vs. Move Question: Which greedy algorithm is more effective? Setup: Preliminary Experiments: Pairwise measures behave counter-intuitively left out of experimental analysis Experiments on Real-World Networks taken from the benchmark sets of Arenas and Newman Outcome: Different Configurations Intracluster density measure Intercluster sparsity measure Parameter α Summary: In 74 percent of all configurations, greedy vertex moving performs better than greedy merging Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 43/49

Social Network of Dolphins [Lusseau 04] Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: average intercluster density maximum intercluster density global intercluster density intercluster edges Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Restriction: global intracluster density > 0.2 Objectives: av. intercluster conductance av. intercluster expansion Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Restriction: global intracluster density > 0.2 Objective: max. intercluster expansion max. intercluster conductance Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Objective: modularity Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Social Network of Dolphins Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 44/49

Planted Partition Graphs: Setup Planted Partition Graph: p in p out Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49

Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49

Planted Partition Graphs: Setup Planted Partition Graph: p in p out Question What is the distance between clustering found by objective function and hidden clustering? Parameter α expected intracluster density Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 45/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE reference Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Rough Summary Distance to reference clustering.0 0.8 global intracluster density minimum intracluster density 0.6 0.4 0.2 ML-MOD MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE MOD NXE GXD MIXD AIXD MIXC AIXC MIXE AIXE Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 46/49

Planted Partition Graphs: Insights Investigating different configurations yields further insights: Using average intracluster density as constraint leads to very unbalanced clusterings Constraining modularity by maximum intracluster density improves its results... especially if expected number of clusters is high Fine reference clusterings disbalance maximum objectives Average intercluster expansion/density identify many clusters Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 47/49

Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 48/49

Conclusion Clustering as bicriterial problem Optimize inter-cluster sparsity respecting intra-cluster density Collection of new measures Algorithm Engineering aspects: Formulation of measures Classification of measures with respect to greedy merge Insights about behavior of measures Experimental evaluation of greedy methods Experimental comparison on planted partition graphs Thank you for your attention! Algorithmic Methods for Complex Network Analysis: Graph Clustering September 9, 204 48/49