Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics
Algorithm Engineering design analyze Algorithms implement experiment 1 Christian Schulz:
Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
Huge Complex Networks Rn n 3 Ax = b Rn 3 Christian Schulz:
Scalable Graph Partitioning joint work with: H. Meyerhenke and P. Sanders 3 Christian Schulz:
The Common Parallel Approach Mesh partitioned via dual graph 1. Each volume (data, calculation) represented by a vertex (+edges) 2. Interdependencies represented by edges All PE s get same amount of work Communication is expensive Graph Partitioning Problem: Partition a graph into (almost) equally sized blocks, such that the number of edges connecting vertices from different blocks is minimal. 4 Christian Schulz:
ɛ-balanced Graph Partitioning Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks s.t. total node weight of each block 1 + ɛ total node weight k total weight of cut edges as small as possible Relevant Applications: graph processing frameworks, parallel sparse matrix vector mult.... 5 Christian Schulz:
Multilevel Graph Partitioning input graph output partition... local improvement... match contract initial partitioning uncontract Successful in many systems (using matchings or AMG for coarsening) 6 Christian Schulz:
Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
Local Search 1 1 1 0 1 0 0 1 1 1 compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
Local Search 1 1 1 0 1 0 0 1 1 1 compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
Local Search 1 1 3 2 1 0 0 1 1 update gain g(v) of neighbors move a node at most once edge cut: 7, 6 8 Christian Schulz:
Local Search 1 1 3 2 0 0 3 3 until stop criteria reached increase in cut possible edge cut: 7, 6, 5 8 Christian Schulz:
Local Search 3 1 3 2 0 3 3 until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5 8 Christian Schulz:
Local Search 3 2 3 2 3 3 until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5, 6 8 Christian Schulz:
Local Search cut #steps increase in cut possible undo changes until best solution reached 9 Christian Schulz:
Local Search final partition with edge cut 5 linear time 10 Christian Schulz:
KaHIP Karlsruhe High Quality Partitioning buffoon [ALENEX12] social [SEA14,IPDPS15] separators distr. evol. alg. [ALENEX12] [DIMACS12] highly balanced: [SEA13] A 0 B A 0 B A 1 0 1 1 0 1 0 1 0 1 C C C B V F W [ESA11] cycles a la multigrid input graph Output Partition [IPDPS10]... flows etc. [ESA11]... edge local improvement ratings match + [SEA12/14] contract uncontract initial partitioning parallel [IPDPS10] Multilevel Graph Partitioning 11 Christian Schulz:
KaHIP Benchmarks 1. Walschaw Benchmark: runtime neglected 816 instances (ɛ {0%, 1%, 3%, 5%}) focus on solution quality almost all instances improved or reproduced 2. 10th DIMACS Implementation Challenge best scores in categories: solution quality and runtime vs. solution quality 12 Christian Schulz:
Partition Complex Networks 12 Christian Schulz:
Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
Basic Idea aggressive contraction / simple and fast local search main idea: contract clusterings clustering paradigm: internally dense and externally sparse 14 Christian Schulz:
Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
Label Propagation Cut-based, Linear Time Clustering Algorithm [Raghavan et. al] cut-based clustering using size-constraint label propagation start with singletons traverse nodes in random order or smallest degree first move node to cluster having strongest eligible connection modification eligible: w.r.t size constraint U...... Scan 16 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Iteration Cut [%] 0 100 1 8.96 2 6.15 3 5.66 4 5.44 5 5.28 6 5.25 7 5.21 8 5.18... 5.09 17 Christian Schulz:
Label Propagation Simple Local Search Greedy Local Search: start with partition from coarser level traverse nodes in random order move node to cluster having strongest eligible connection eligible: w.r.t size constraint U := (1 + ɛ) V k...... Scan 18 Christian Schulz:
Parallelization 18 Christian Schulz:
Graph Distribution over PEs Graph Distribution: a PE receives n/p vertices and their edges Processor I Processor II Communication ghost nodes: adjacent nodes on other processor (communication!) interface nodes: nodes adjacent to ghost nodes 19 Christian Schulz:
Label Propagation Distributed Memory each PE has a static part of the graph, only block IDs can change Overlap Computation and Communication (PE centric view): V Phase i 1 Phase i Phase i+1 Scan At the end of phase i: send block ID updates of phase i to neighboring PEs receive block ID updates from neighboring PEs from phase i 1 * while scanning in phase i, messages are routed through the network * algorithm converged nothing will be communicated 20 Christian Schulz:
Contraction of Clusterings The Parallel Case High Level a b c A C B a+b+c A+B+C parallel find mapping C : n.. 1 n.. 1 exchange subgraphs, compute contracted graph locally when graph small parallel initial partitioning 21 Christian Schulz:
Experiments 21 Christian Schulz:
Parallel Solution Quality Performance k = 2 blocks, 32PEs instances: meshes and social networks/web graphs ParMetis has ineffective coarsening (matching-based) due to memory consumption of coarsest graph (dist. among PEs) could not solve arabic-2005, sk-2005 and uk-2007 solved instances (ParMetis): fast and eco yield 19.2% and 27.4% improvement fast and eco slower on average social networks/web graphs: fast: 38% less cut edges and > 2 faster eco: 45% less cut edges and slower best instance: 18 faster and 61.6% less cut edges improvement over Facebook [Ugander and Backstrom]: 45% less cut edges on LiveJournal and much faster (k = 100) 22 Christian Schulz:
Strong Scaling Social Networks 1000 Fast sk-2007 Fast arabic-2005 Fast uk-2002 Fast uk-2007 Minimal uk-2007 total time [s] 100 10 1 2 4 8 16 32 64 256 1K 2K number of PEs p uk-2007 can be partitioned in 15.2 seconds (seq. 10.5min) 72 seconds for random geometric graph with 22G edges more scaling results in the paper 23 Christian Schulz:
Graph Drawing joint work with: H. Meyerhenke and M. Nöllenburg 23 Christian Schulz:
Problem x : V R 2 G = (V, E, d) d : E R 24 Christian Schulz:
Problem x : V R 2 G = (V, E, d) d : E R d 1 24 Christian Schulz:
Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E 25 Christian Schulz:
Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E not possible to satisfy all constraints! model may be infeasible 25 Christian Schulz:
Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme... 26 Christian Schulz:
Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme... 26 Christian Schulz:
Maximal Entropy Stress Model [Gansner et al. 13]... or iterative scheme: x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v overall update costs O(n 2 ) per iteration ) + α ρ u {u,v}/ E x u x v x u x v 2 Our contributions: make this usable and fast in practice multilevel integration approximate long-range forces employ parallelism 27 Christian Schulz:
Multilevel Graph Drawing [Hadany, Harel 99] input graph output drawing... improve drawing... contract initial drawing uncontract as before: contract using size-constrained clusterings 28 Christian Schulz:
Multilevel Graph Drawing Initial Drawing coarsen until only two nodes left place them at optimal distance define distances on coarse graphs, stay tuned x u = (0, 0) x v = (0, d uv ) 29 Christian Schulz:
Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
Local Improvement Iterative Scheme Approximation x u... {u,v}/ E x u x v x u x v } {{ 2 } =:r(u,v) x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 {u,v} E r(u, v) M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v 31 Christian Schulz:
Local Improvement Approximation x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v {u,v} E r(u, v) 32 Christian Schulz:
Local Improvement Additional Enhancements after each iteration update barycenter of coarse nodes vertex computations independent add parallelism use approximation multiple h levels beneath in hierarchy input graph output drawing... improve drawing... contract initial drawing uncontract Proposition: Assume equal cluster sizes. The running time of one iteration of MulMent h, h 0, is O(m + n h+2 h+1 ) 33 Christian Schulz:
Experiments 33 Christian Schulz:
Scalability Running Time [Delaunay n = 2 20 ] 10 6 MulMent 0 MulMent 1 MulMent 2 MulMent 4 MulMent 6 MulMent 8 MulMent 9 MulMent 10 10 5 total time [s] 10 4 10 3 10 2 1 2 4 8 16 32 64 number of PEs p 34 Christian Schulz:
Running Times graph PMDS MaxEnt MulMent 0 MulMent 1 MulMent 10 btree 0.02 1.14 0.10 0.17 0.11 1138 bus 0.04 1.41 0.15 0.13 0.11 USpowerG 0.14 3.82 0.29 0.23 0.18 3elt 0.16 3.45 0.21 0.19 0.17 commanche 0.24 5.42 0.32 0.27 0.18 bcsstk31 3.44 48.48 5.63 3.97 1.82 fe pwt 1.49 31.60 4.46 2.67 0.66 del16 2.86 61.42 13.63 7.75 1.10 luxembourg 3.10 96.10 40.94 22.87 1.39 nyc 9.03 233.94 216.27 119.33 3.70 auto 41.80 665.67 613.51 329.08 20.70 del20 53.80 1125.03 3303.82 1749.77 27.01 Table : Running times in seconds per graph. Smaller is better. PivotMDS and MaxEnt use one thread (sequential codes), the MulMent algorithms use 32 cores (64 threads). Running times of MaxEnt are without the time of PMDS (which yields input coordinates to MaxEnt) 35 Christian Schulz:
Experimental Results Summary Influence of h / Comparision increasing h not a large impact on solution quality maxent-stress remains comparable maxent-stress of MaxEnt and MulMent more or less similar Dynamic Networks model: remove x% random edges, insert x% edges (distance D) 4 faster (h = 0), save 50% time (h = 7) 9% worse full-stress, 1% better maxent-stress 36 Christian Schulz:
Example Drawings fe pwt PMDS MaxEnt MulMent 37 Christian Schulz:
Example Drawings bcsstk31 PMDS 37 Christian Schulz: MaxEnt MulMent
Example Drawings commanche PMDS MaxEnt MulMent 37 Christian Schulz:
Independent Sets joint work with: S. Lamm, P. Sanders, D. Strash and R. Werneck 37 Christian Schulz:
Definitions Independent Set: subset S V such that there are no adjacent nodes in S Maximum Independent Set: maximum cardinality set S related to maximum clique and minimum vertex cover finding a MIS is NP-hard and hard to approximate 38 Christian Schulz:
Common Approaches use heuristic algorithms gradually improve a single solution node deletions, insertions and swaps plateau search diversification & restart rules Andrade et al. (2012): iterated local search using (j, k)-swaps 39 Christian Schulz:
ARW Local Search Node Swaps: (j, k)-swaps: remove j solution nodes and insert k new ones (1, 2)-swaps: remove single solution node and insert two new ones Local Search: search for (1, 2)-swaps in time O(m) use data structure that supports fast insertion and removal Solution nodes Free nodes Non-free non-solution nodes 40 Christian Schulz:
ɛ-balanced Graph Partitioning Node Separators Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks + node separator s.t. total node weight of each block 1 + ɛ total node weight k total size of node separator as small as possible 41 Christian Schulz:
Evolutionary Algorithms General Structure procedure steady-state-ea create initial population P while stopping criterion not fulfilled select parents P 1, P 2 from P combine P 1 with P 2 to create offspring o mutate offspring o evict individual in population using o return the fittest individual that occurred 42 Christian Schulz:
Combine Operations Graph Partitioning: exchange whole blocks of solutions operators for edge separators and node separators small cut-size vital for efficiency + = Local Search: resulting independent sets may not be maximal use maximization step and ARW to reach local optimum 43 Christian Schulz:
Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with greedy + ARW local search 44 Christian Schulz:
Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with Greedy + ARW local search 44 Christian Schulz:
Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality solve problem on problem kernel (using EA) obtain solution on input graph 45 Christian Schulz:
Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v 45 Christian Schulz:
Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v much more reductions used in practice [Lamm et al. 16] 45 Christian Schulz:
Guess likely candidates can we guess vertices that are in MIS? idea: select small-degree vertices from fittest independent set apply more reductions and recurse! 46 Christian Schulz:
Experiments 46 Christian Schulz:
Near-optimal on difficult networks finds exact MIS faster, when exact algorithm is slow: Skitter 48 min 21 min Stanford 13 hours 5 min bcsstk30 8.6 hours 2.4 sec Skitter 2 hours 28 sec,... finds exact MIS, for large networks with known MIS size consistently finds larger solutions on social and road networks Solution Size 230000 229500 229000 228500 228000 227500 227000 10-1 10 0 10 1 10 2 10 3 10 4 10 5 Time [s] 131600 131400 131200 131000 130800 130600 130400 130200 ARW EvoMIS 130000 ReduMIS 129800 Solution Size 10-1 10 0 10 1 10 2 10 3 10 4 10 5 Time [s] ARW EvoMIS ReduMIS consistent, even as we scale to graphs on 10M to 100M nodes 47 Christian Schulz:
Conclusions 47 Christian Schulz:
Conclusion apply algorithm engineering to optimization problems obtain algorithms that scale to large inputs and machines outperform state-of-the-art open source implementations KaHIP: algo2.iti.kit.edu/kahip KaDraw: algo2.iti.kit.edu/kadraw KaMIS: algo2.iti.kit.edu/kamis realistic algorithm models 3 engineering perf. guarantees 48 Christian Schulz: experiments appl. engineering implementation algorithm libraries 9 10 applications analysis deduction real Inputs design 4 falsifiable 5 hypotheses 7 induction 6 8