Big Data Graph Algorithms
|
|
- Marion Morgan
- 8 years ago
- Views:
Transcription
1 Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical Informatics
2 Algorithm Engineering design analyze Algorithms implement experiment 1 Christian Schulz:
3 Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
4 Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
5 Huge Complex Networks Rn n 3 Ax = b Rn 3 Christian Schulz:
6 Scalable Graph Partitioning joint work with: H. Meyerhenke and P. Sanders 3 Christian Schulz:
7 The Common Parallel Approach Mesh partitioned via dual graph 1. Each volume (data, calculation) represented by a vertex (+edges) 2. Interdependencies represented by edges All PE s get same amount of work Communication is expensive Graph Partitioning Problem: Partition a graph into (almost) equally sized blocks, such that the number of edges connecting vertices from different blocks is minimal. 4 Christian Schulz:
8 ɛ-balanced Graph Partitioning Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks s.t. total node weight of each block 1 + ɛ total node weight k total weight of cut edges as small as possible Relevant Applications: graph processing frameworks, parallel sparse matrix vector mult Christian Schulz:
9 Multilevel Graph Partitioning input graph output partition... local improvement... match contract initial partitioning uncontract Successful in many systems (using matchings or AMG for coarsening) 6 Christian Schulz:
10 Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
11 Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
12 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
13 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
14 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
15 Local Search update gain g(v) of neighbors move a node at most once edge cut: 7, 6 8 Christian Schulz:
16 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5 8 Christian Schulz:
17 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5 8 Christian Schulz:
18 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5, 6 8 Christian Schulz:
19 Local Search cut #steps increase in cut possible undo changes until best solution reached 9 Christian Schulz:
20 Local Search final partition with edge cut 5 linear time 10 Christian Schulz:
21 KaHIP Karlsruhe High Quality Partitioning buffoon [ALENEX12] social [SEA14,IPDPS15] separators distr. evol. alg. [ALENEX12] [DIMACS12] highly balanced: [SEA13] A 0 B A 0 B A C C C B V F W [ESA11] cycles a la multigrid input graph Output Partition [IPDPS10]... flows etc. [ESA11]... edge local improvement ratings match + [SEA12/14] contract uncontract initial partitioning parallel [IPDPS10] Multilevel Graph Partitioning 11 Christian Schulz:
22 KaHIP Benchmarks 1. Walschaw Benchmark: runtime neglected 816 instances (ɛ {0%, 1%, 3%, 5%}) focus on solution quality almost all instances improved or reproduced 2. 10th DIMACS Implementation Challenge best scores in categories: solution quality and runtime vs. solution quality 12 Christian Schulz:
23 Partition Complex Networks 12 Christian Schulz:
24 Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
25 Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
26 Basic Idea aggressive contraction / simple and fast local search main idea: contract clusterings clustering paradigm: internally dense and externally sparse 14 Christian Schulz:
27 Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
28 Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
29 Label Propagation Cut-based, Linear Time Clustering Algorithm [Raghavan et. al] cut-based clustering using size-constraint label propagation start with singletons traverse nodes in random order or smallest degree first move node to cluster having strongest eligible connection modification eligible: w.r.t size constraint U Scan 16 Christian Schulz:
30 Label Propagation Iteration Cut [%] Christian Schulz:
31 Label Propagation Iteration Cut [%] Christian Schulz:
32 Label Propagation Iteration Cut [%] Christian Schulz:
33 Label Propagation Iteration Cut [%] Christian Schulz:
34 Label Propagation Iteration Cut [%] Christian Schulz:
35 Label Propagation Iteration Cut [%] Christian Schulz:
36 Label Propagation Iteration Cut [%] Christian Schulz:
37 Label Propagation Iteration Cut [%] Christian Schulz:
38 Label Propagation Iteration Cut [%] Christian Schulz:
39 Label Propagation Iteration Cut [%] Christian Schulz:
40 Label Propagation Simple Local Search Greedy Local Search: start with partition from coarser level traverse nodes in random order move node to cluster having strongest eligible connection eligible: w.r.t size constraint U := (1 + ɛ) V k Scan 18 Christian Schulz:
41 Parallelization 18 Christian Schulz:
42 Graph Distribution over PEs Graph Distribution: a PE receives n/p vertices and their edges Processor I Processor II Communication ghost nodes: adjacent nodes on other processor (communication!) interface nodes: nodes adjacent to ghost nodes 19 Christian Schulz:
43 Label Propagation Distributed Memory each PE has a static part of the graph, only block IDs can change Overlap Computation and Communication (PE centric view): V Phase i 1 Phase i Phase i+1 Scan At the end of phase i: send block ID updates of phase i to neighboring PEs receive block ID updates from neighboring PEs from phase i 1 * while scanning in phase i, messages are routed through the network * algorithm converged nothing will be communicated 20 Christian Schulz:
44 Contraction of Clusterings The Parallel Case High Level a b c A C B a+b+c A+B+C parallel find mapping C : n.. 1 n.. 1 exchange subgraphs, compute contracted graph locally when graph small parallel initial partitioning 21 Christian Schulz:
45 Experiments 21 Christian Schulz:
46 Parallel Solution Quality Performance k = 2 blocks, 32PEs instances: meshes and social networks/web graphs ParMetis has ineffective coarsening (matching-based) due to memory consumption of coarsest graph (dist. among PEs) could not solve arabic-2005, sk-2005 and uk-2007 solved instances (ParMetis): fast and eco yield 19.2% and 27.4% improvement fast and eco slower on average social networks/web graphs: fast: 38% less cut edges and > 2 faster eco: 45% less cut edges and slower best instance: 18 faster and 61.6% less cut edges improvement over Facebook [Ugander and Backstrom]: 45% less cut edges on LiveJournal and much faster (k = 100) 22 Christian Schulz:
47 Strong Scaling Social Networks 1000 Fast sk-2007 Fast arabic-2005 Fast uk-2002 Fast uk-2007 Minimal uk-2007 total time [s] K 2K number of PEs p uk-2007 can be partitioned in 15.2 seconds (seq. 10.5min) 72 seconds for random geometric graph with 22G edges more scaling results in the paper 23 Christian Schulz:
48 Graph Drawing joint work with: H. Meyerhenke and M. Nöllenburg 23 Christian Schulz:
49 Problem x : V R 2 G = (V, E, d) d : E R 24 Christian Schulz:
50 Problem x : V R 2 G = (V, E, d) d : E R d 1 24 Christian Schulz:
51 Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E 25 Christian Schulz:
52 Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E not possible to satisfy all constraints! model may be infeasible 25 Christian Schulz:
53 Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme Christian Schulz:
54 Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme Christian Schulz:
55 Maximal Entropy Stress Model [Gansner et al. 13]... or iterative scheme: x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v overall update costs O(n 2 ) per iteration ) + α ρ u {u,v}/ E x u x v x u x v 2 Our contributions: make this usable and fast in practice multilevel integration approximate long-range forces employ parallelism 27 Christian Schulz:
56 Multilevel Graph Drawing [Hadany, Harel 99] input graph output drawing... improve drawing... contract initial drawing uncontract as before: contract using size-constrained clusterings 28 Christian Schulz:
57 Multilevel Graph Drawing Initial Drawing coarsen until only two nodes left place them at optimal distance define distances on coarse graphs, stay tuned x u = (0, 0) x v = (0, d uv ) 29 Christian Schulz:
58 Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
59 Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
60 Local Improvement Iterative Scheme Approximation x u... {u,v}/ E x u x v x u x v } {{ 2 } =:r(u,v) x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 {u,v} E r(u, v) M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v 31 Christian Schulz:
61 Local Improvement Approximation x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v {u,v} E r(u, v) 32 Christian Schulz:
62 Local Improvement Additional Enhancements after each iteration update barycenter of coarse nodes vertex computations independent add parallelism use approximation multiple h levels beneath in hierarchy input graph output drawing... improve drawing... contract initial drawing uncontract Proposition: Assume equal cluster sizes. The running time of one iteration of MulMent h, h 0, is O(m + n h+2 h+1 ) 33 Christian Schulz:
63 Experiments 33 Christian Schulz:
64 Scalability Running Time [Delaunay n = 2 20 ] 10 6 MulMent 0 MulMent 1 MulMent 2 MulMent 4 MulMent 6 MulMent 8 MulMent 9 MulMent total time [s] number of PEs p 34 Christian Schulz:
65 Running Times graph PMDS MaxEnt MulMent 0 MulMent 1 MulMent 10 btree bus USpowerG elt commanche bcsstk fe pwt del luxembourg nyc auto del Table : Running times in seconds per graph. Smaller is better. PivotMDS and MaxEnt use one thread (sequential codes), the MulMent algorithms use 32 cores (64 threads). Running times of MaxEnt are without the time of PMDS (which yields input coordinates to MaxEnt) 35 Christian Schulz:
66 Experimental Results Summary Influence of h / Comparision increasing h not a large impact on solution quality maxent-stress remains comparable maxent-stress of MaxEnt and MulMent more or less similar Dynamic Networks model: remove x% random edges, insert x% edges (distance D) 4 faster (h = 0), save 50% time (h = 7) 9% worse full-stress, 1% better maxent-stress 36 Christian Schulz:
67 Example Drawings fe pwt PMDS MaxEnt MulMent 37 Christian Schulz:
68 Example Drawings bcsstk31 PMDS 37 Christian Schulz: MaxEnt MulMent
69 Example Drawings commanche PMDS MaxEnt MulMent 37 Christian Schulz:
70 Independent Sets joint work with: S. Lamm, P. Sanders, D. Strash and R. Werneck 37 Christian Schulz:
71 Definitions Independent Set: subset S V such that there are no adjacent nodes in S Maximum Independent Set: maximum cardinality set S related to maximum clique and minimum vertex cover finding a MIS is NP-hard and hard to approximate 38 Christian Schulz:
72 Common Approaches use heuristic algorithms gradually improve a single solution node deletions, insertions and swaps plateau search diversification & restart rules Andrade et al. (2012): iterated local search using (j, k)-swaps 39 Christian Schulz:
73 ARW Local Search Node Swaps: (j, k)-swaps: remove j solution nodes and insert k new ones (1, 2)-swaps: remove single solution node and insert two new ones Local Search: search for (1, 2)-swaps in time O(m) use data structure that supports fast insertion and removal Solution nodes Free nodes Non-free non-solution nodes 40 Christian Schulz:
74 ɛ-balanced Graph Partitioning Node Separators Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks + node separator s.t. total node weight of each block 1 + ɛ total node weight k total size of node separator as small as possible 41 Christian Schulz:
75 Evolutionary Algorithms General Structure procedure steady-state-ea create initial population P while stopping criterion not fulfilled select parents P 1, P 2 from P combine P 1 with P 2 to create offspring o mutate offspring o evict individual in population using o return the fittest individual that occurred 42 Christian Schulz:
76 Combine Operations Graph Partitioning: exchange whole blocks of solutions operators for edge separators and node separators small cut-size vital for efficiency + = Local Search: resulting independent sets may not be maximal use maximization step and ARW to reach local optimum 43 Christian Schulz:
77 Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with greedy + ARW local search 44 Christian Schulz:
78 Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with Greedy + ARW local search 44 Christian Schulz:
79 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality solve problem on problem kernel (using EA) obtain solution on input graph 45 Christian Schulz:
80 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v 45 Christian Schulz:
81 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v much more reductions used in practice [Lamm et al. 16] 45 Christian Schulz:
82 Guess likely candidates can we guess vertices that are in MIS? idea: select small-degree vertices from fittest independent set apply more reductions and recurse! 46 Christian Schulz:
83 Experiments 46 Christian Schulz:
84 Near-optimal on difficult networks finds exact MIS faster, when exact algorithm is slow: Skitter 48 min 21 min Stanford 13 hours 5 min bcsstk hours 2.4 sec Skitter 2 hours 28 sec,... finds exact MIS, for large networks with known MIS size consistently finds larger solutions on social and road networks Solution Size Time [s] ARW EvoMIS ReduMIS Solution Size Time [s] ARW EvoMIS ReduMIS consistent, even as we scale to graphs on 10M to 100M nodes 47 Christian Schulz:
85 Conclusions 47 Christian Schulz:
86 Conclusion apply algorithm engineering to optimization problems obtain algorithms that scale to large inputs and machines outperform state-of-the-art open source implementations KaHIP: algo2.iti.kit.edu/kahip KaDraw: algo2.iti.kit.edu/kadraw KaMIS: algo2.iti.kit.edu/kamis realistic algorithm models 3 engineering perf. guarantees 48 Christian Schulz: experiments appl. engineering implementation algorithm libraries 9 10 applications analysis deduction real Inputs design 4 falsifiable 5 hypotheses 7 induction 6 8
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationFAST LOCAL SEARCH FOR THE MAXIMUM INDEPENDENT SET PROBLEM
FAST LOCAL SEARCH FOR THE MAXIMUM INDEPENDENT SET PROBLEM DIOGO V. ANDRADE, MAURICIO G.C. RESENDE, AND RENATO F. WERNECK Abstract. Given a graph G = (V, E), the independent set problem is that of finding
More informationA FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS
SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationEngineering Route Planning Algorithms. Online Topological Ordering for Dense DAGs
Algorithm Engineering for Large Graphs Engineering Route Planning Algorithms Peter Sanders Dominik Schultes Universität Karlsruhe (TH) Online Topological Ordering for Dense DAGs Deepak Ajwani Tobias Friedrich
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationHow To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationNetwork Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationLecture 12: Partitioning and Load Balancing
Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationAn Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,
More informationGuessing Game: NP-Complete?
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationFRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationA scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
More informationBig graphs for big data: parallel matching and clustering on billion-vertex graphs
1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan
More informationMuACOsm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State Machines
MuACOsm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State Machines Daniil Chivilikhin and Vladimir Ulyantsev National Research University of IT, Mechanics and Optics St.
More informationComputer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
More informationGraph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationTowards real-time image processing with Hierarchical Hybrid Grids
Towards real-time image processing with Hierarchical Hybrid Grids International Doctorate Program - Summer School Björn Gmeiner Joint work with: Harald Köstler, Ulrich Rüde August, 2011 Contents The HHG
More informationThe Minimum Consistent Subset Cover Problem and its Applications in Data Mining
The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationBranch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
More informationAttacking Anonymized Social Network
Attacking Anonymized Social Network From: Wherefore Art Thou RX3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography Presented By: Machigar Ongtang (Ongtang@cse.psu.edu ) Social
More informationrecursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationComputing Many-to-Many Shortest Paths Using Highway Hierarchies
Computing Many-to-Many Shortest Paths Using Highway Hierarchies Sebastian Knopp Peter Sanders Dominik Schultes Frank Schulz Dorothea Wagner Abstract We present a fast algorithm for computing all shortest
More informationDynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion
Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion Henning Meyerhenke University of Paderborn Department of Computer Science Fürstenallee 11, 33102
More informationOptimized Software Component Allocation On Clustered Application Servers. by Hsiauh-Tsyr Clara Chang, B.B., M.S., M.S.
Optimized Software Component Allocation On Clustered Application Servers by Hsiauh-Tsyr Clara Chang, B.B., M.S., M.S. Submitted in partial fulfillment of the requirements for the degree of Doctor of Professional
More informationCIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
More informationPerformance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
More informationParallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationA Constraint Programming based Column Generation Approach to Nurse Rostering Problems
Abstract A Constraint Programming based Column Generation Approach to Nurse Rostering Problems Fang He and Rong Qu The Automated Scheduling, Optimisation and Planning (ASAP) Group School of Computer Science,
More informationShape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI
Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Henning Meyerhenke Institute of Theoretical Informatics Karlsruhe Institute of Technology Am Fasanengarten 5, 76131
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationExternal Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
More informationParallel Simulated Annealing Algorithm for Graph Coloring Problem
Parallel Simulated Annealing Algorithm for Graph Coloring Problem Szymon Łukasik 1, Zbigniew Kokosiński 2, and Grzegorz Świętoń 2 1 Systems Research Institute, Polish Academy of Sciences, ul. Newelska
More informationExponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
More informationChapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationSIMS 255 Foundations of Software Design. Complexity and NP-completeness
SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity
More informationDistributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
More informationCSC2420 Spring 2015: Lecture 3
CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationScheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
More informationLoad balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More information5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
More informationInterconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
More informationIntroduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationReductions & NP-completeness as part of Foundations of Computer Science undergraduate course
Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course Alex Angelopoulos, NTUA January 22, 2015 Outline Alex Angelopoulos (NTUA) FoCS: Reductions & NP-completeness-
More informationData Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
More informationParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
More information5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
More informationDesign and Analysis of ACO algorithms for edge matching problems
Design and Analysis of ACO algorithms for edge matching problems Carl Martin Dissing Söderlind Kgs. Lyngby 2010 DTU Informatics Department of Informatics and Mathematical Modelling Technical University
More informationA Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University
More informationDistributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
More informationModel-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms
Symposium on Automotive/Avionics Avionics Systems Engineering (SAASE) 2009, UC San Diego Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms Dipl.-Inform. Malte Lochau
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
More informationwalberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationComplexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
More informationCommunity Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
More informationA number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution
Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that
More informationarxiv:1412.2333v1 [cs.dc] 7 Dec 2014
Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,
More informationPerron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
More informationDynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationForschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationAlgorithmic Methods for Complex Network Analysis: Graph Clustering
Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL
More informationEvolutionary Detection of Rules for Text Categorization. Application to Spam Filtering
Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationExpanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations
Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationDefinition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationIntroduction To Genetic Algorithms
1 Introduction To Genetic Algorithms Dr. Rajib Kumar Bhattacharjya Department of Civil Engineering IIT Guwahati Email: rkbc@iitg.ernet.in References 2 D. E. Goldberg, Genetic Algorithm In Search, Optimization
More informationR-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
More information