Big graphs for big data: parallel matching and clustering on billion-vertex graphs

Size: px
Start display at page:

Download "Big graphs for big data: parallel matching and clustering on billion-vertex graphs"

Transcription

1 1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan Yzelman Asia-trip A-Eskwadraat, July 2014

2 2 Graph algorithm 1/2-approximation algorithm algorithm

3 3 can win you a Nobel prize Source: Slate magazine October 15, 2012

4 Motivation of graph matching 4 Graph matching is a pairing of neighbouring vertices. It has applications in medicine: finding suitable donors for organs social networks: finding partners scientific computing: finding pivot elements in matrix computations graph coarsening: making the graph smaller by merging similar vertices

5 5 Motivation of greedy/approximation graph matching Optimal solution is possible in polynomial time. Time for weighted matching in graph G = (V, E) is O(mn + n 2 log n) with n = V the number of vertices, and m = E the number of edges (Gabow 1990). The aim is a billion vertices, n = 10 9, with 100 edges per vertex, i.e. m = Thus, a time of O(10 20 ) = 100, 000 Petaflop units is far too long. Fastest supercomputer today, the Chinese Tianhe-2 (Milky-Way 2), performs 33.8 Petaflop/s. We need linear-time greedy or approximation algorithms.

6 6 Formal definition of graph matching A graph is a pair G = (V, E) with vertices V and edges E. All edges e E are of the form e = (v, w) for vertices v, w V. A matching is a collection M E of disjoint edges. Here, the graph is undirected, so (v, w) = (w, v).

7 7 Maximal matching A matching is maximal if we cannot enlarge it further by adding another edge to it.

8 8 Maximum matching A matching is maximum if it possesses the largest possible number of edges, compared to all other matchings.

9 9 Edge-weighted matching If the edges are provided with weights ω : E R>0, finding a matching M which maximises ω(m) = e M ω(e), is called edge-weighted matching. matching provides us with maximal matchings, but not necessarily with maximum possible weight.

10 greedy matching 10 In random order, vertices v V select and match neighbours one-by-one. Here, we can pick the first available neighbour w of v, greedy random matching the neighbour w with maximum ω(v, w), greedy weighted matching Or: we sort all the edges by weight, and successively match the vertices v and w of the heaviest available edge (v, w). This is commonly called greedy matching.

11 11 greedy random matching

12 11 greedy random matching

13 11 greedy random matching

14 11 greedy random matching

15 11 greedy random matching

16 11 greedy random matching

17 11 greedy random matching

18 11 greedy random matching

19 11 greedy random matching

20 11 greedy random matching

21 11 greedy random matching

22 11 greedy random matching

23 11 greedy random matching

24 11 greedy random matching

25 11 greedy random matching

26 11 greedy random matching

27 11 greedy random matching

28 12 matching is a 1/2-approximation algorithm Weight ω(m) ωoptimal /2 Cardinality M Mcard max /2, because M is maximal. Time complexity is O(m log m), because all edges must be sorted.

29 13 Parallel greedy matching: trouble Suppose we match vertices simultaneously.

30 13 Parallel greedy matching: trouble Two vertices each find an unmatched neighbour...

31 13 Parallel greedy matching: trouble but generate an invalid matching.

32 dominant-edge algorithm 14 while E do pick a dominant edge (v, w) E M := M {(v, w)} E := E \ {(x, y) E : x = v x = w} V := V \ {v, w} return M An edge (v, w) E is dominant if ω(v, w) = max{ω(x, y) : (x, y) E (x = v x = w)} v 6 w

33 15 approximation algorithm: initialisation function Seq(V, E) for all v V do pref (v) = null D := M := { Find dominant edges } for all v V do Adj v := {w V : (v, w) E} pref (v) := argmax{ω(v, w) : w Adj v } if pref (pref (v)) = v then D := D {v, pref (v)} M := M {(v, pref (v))}

34 Mutual preferences v 6 w

35 Non-mutual preferences v 6 w

36 18 approximation algorithm: main loop while D do pick v D D := D \ {v} for all x Adj v \ {pref (v)} : (x, pref (x)) / M do Adj x := Adj x \ {v} { Set new preference } pref (x) := argmax{ω(x, w) : w Adj x } if pref (pref (x)) = x then D := D {x, pref (x)} M := M {(x, pref (x))} return M

37 19 Properties of the dominant-edge algorithm Dominant-edge algorithm is a 1/2-approximation: ω(m) ω optimal /2 Dominance is a local property: easy to parallelise. Algorithm keeps going until set of dominant vertices D is empty and matching M is maximal. Assumption without loss of generality: weights are unique. Otherwise, use vertex numbering to break ties.

38 Time complexity 20 Linear time complexity O( E ) if edges of each vertex are sorted by weight. Sorting costs are deg(v) log deg(v) v v where is the maximum vertex degree. deg(v) log = 2 E log, This algorithm is based on a dominant-edge algorithm by Preis (1999), called LAM, which is linear-time O( E ), does not need sorting, and also is a 1/2-approximation, but is hard to parallelise.

39 21 Parallel computer: abstract model Communication network P P P P P M M M M M Bulk synchronous parallel (BSP) computer. Proposed by Leslie Valiant, 1989.

40 22 Parallel algorithm: supersteps P(0) P(1) P(2) P(3) P(4) sync sync sync comp comm comm comp sync sync comm

41 23 Composition with Red, Yellow, Blue and Black Piet Mondriaan 1921

42 24 Mondriaan data distribution for matrix prime60 Non-Cartesian block distribution of matrix prime60 with 462 nonzeros, for p = 4 aij 0 i j or j i (1 i, j 60)

43 25 Parallel algorithm (Manne & Bisseling, 2007) Processor P(s) has vertex set Vs, with and V s V t = if s t. p 1 s=0 V s = V This is a p-way partitioning of the vertex set.

44 Halo vertices 26 The adjacency set Adj v of a vertex v may contain vertices w from another processor. We define the set of halo vertices H s = v V s Adj v \ V s The weights ω(v, w) are stored with the edges, for all v V s and w V s H s. Es = {(v, w) E : v V s } is the subset of all the edges connected to V s.

45 27 Parallel algorithm for P(s): initialisation function Par(V s, H s, E s, distribution φ) for all v V s do pref (v) = null D s := M s := { Find dominant edges } for all v V s do Adj v := {w V s H s : (v, w) E s } SetNewPreference(v, Adj v, pref, V s, D s, M s, φ) Sync

46 28 Setting a vertex preference function SetNewPreference(v, Adj, V, D, M, φ) pref (v) := argmax{ω(v, w) : w Adj} if pref (v) V then if pref (pref (v)) = v then D := D {v, pref (v)} M := M {(v, pref (v))} else put proposal(v, pref (v)) in P(φ(pref (v)))

47 29 How to propose Source: proposal(v, w): v proposes to w

48 30 Parallel algorithm for P(s): main loop process received messages while D s do pick v D s D s := D s \ {v} for all x Adj v \ {pref (v)} : (x, pref (x)) / M s do if x V s then Adj x := Adj x \ {v} SetNewPreference(x, Adj x, pref, V s, D s, M s, φ) else {x H s } put unavailable(v, x) in P(φ(x)) Sync

49 31 Parallel algorithm for P(s): process received messages for all messages m received do if m = proposal(x, y) then if pref (y) = x then D s := D s {y} M s := M s {(x, y)} put accepted(x, y) in P(φ(x)) if m = accepted(x, y) then D s := D s {x} M s := M s {(x, y)} if m = unavailable(v, x) then if (x, pref (x)) / M s then Adj x := Adj x \ {v} SetNewPreference(x, Adj x, pref, V s, D s, M s, φ)

50 32 Termination The algorithm alternates supersteps of computation running the main loop and communication handling the received messages. The whole algorithm terminates when no messages have been received by processor P(s) and the local set D s is empty, for all s. This can be checked at every synchronisation point.

51 33 Load balance Processors can have different amounts of work, even if they have the same number of vertices or edges. Use can be made of a global clock based on ticks, the unit of time needed to handle a vertex x (in O(1)). Here, handling could mean setting a new preference. After every k ticks, everybody synchronises.

52 34 Synchronisation frequency Guidance for the choice of k is provided by the BSP parameter l, the cost of a global synchronisation. Choosing k l guarantees that at most 50% of the total time is spent in synchronisation. Choosing k sufficiently small will cause all processors to be busy during most supersteps. Good choice: k = 2l?

53 35 MulticoreBSP enables shared-memory BSP Albert-Jan Yzelman 2014,

54 GPU matching A different approach, tightly coupled to the GPU architecture. To prevent matching conflicts, we create two groups of vertices: Blue vertices propose. Red vertices respond. Proposals that were responded to, are matched. 7 9

55 37 GPU matching Colour Propose Respond Match π σ

56 37 GPU matching Colour Propose Respond Match π b r r b b r b b r σ

57 37 GPU matching Colour Propose Respond Match π b r r b b r b b r σ

58 37 GPU matching Colour Propose Respond Match π b r r b b r b b r σ

59 37 GPU matching Colour Propose Respond Match π b 2 3 b r σ

60 37 GPU matching Colour Propose Respond Match π r 2 3 r b σ

61 37 GPU matching Colour Propose Respond Match π r 2 3 r b σ d

62 37 GPU matching Colour Propose Respond Match π r 2 3 r b σ d

63 37 GPU matching Colour Propose Respond Match π r 2 3 r d σ d

64 37 GPU matching Colour Propose Respond Match π b 2 3 r d σ d

65 37 GPU matching Colour Propose Respond Match π b 2 3 r d σ

66 37 GPU matching Colour Propose Respond Match π b 2 3 r d σ

67 37 GPU matching Colour Propose Respond Match π d σ

68 Quality of the matching Matched vertices/total nr. of vertices (%) 100 Saturation of matching size 80 ecology2 (1,997,996) ecology1 (1,998,000) 60 G3_circuit (3,037,674) thermal2 (3,676,134) kkt_power (6,482,320) 40 af_shell9 (8,542,010) ldoor (22,785,136) af_shell10 (25,582,130) 20 audikw1 (38,354,076) nlpkkt120 (46,651,696) cage15 (47,022,346) Number of iterations Fraction of matched vertices as a function of the number of iterations. Number of edges between 2 and 47 million. 38

69 Random colouring of the vertices 39 At each iteration, we colour the vertices v V differently. For a fixed p [0, 1] colour(v) = { blue with probability p, red with probability 1 p. How to choose p? Maximise the number of matched vertices. For large random graphs, the expected fraction of matched vertices can be approximated by ) 2 (1 p) (1 e p 1 p. This is independent of the edge density.

70 of road network of the Netherlands (a) G 0 (b) G 11 (c) G 21 (d) G 26 (e) G 33 (f) Best clustering (G 21 ) Graph with 2,216,688 vertices and 2,441,238 edges yields 506 clusters with modularity

71 41 Formal definition of a clustering A clustering of an undirected graph G = (V, E) is a collection C of disjoint subsets of V satisfying V = C C C. Elements C C are called clusters. The number of clusters is not fixed beforehand.

72 Quality measure for clustering: modularity 42 The quality measure modularity was introduced by Newman and Girvan in 2004 for finding communities. Let G = (V, E, ω) be a weighted undirected graph without self-edges. We define ζ(v) = ω(e). (u,v) E ω(u, v), Ω = e E Then, the modularity of a clustering C of G is defined by ( ) ω(u, v) 2 ζ(v) mod(c) = C C (u,v) E u,v C 1 2 mod(c) 1. Ω C C v C 4Ω 2.

73 43 Merging clusters: change in modularity The set of all cut edges between clusters C and C is cut(c, C ) = {{u, v} E u C, v C } If we merge clusters C and C from C into one cluster C C, then we get a new clustering C with mod(c ) = mod(c)+ 1 ( ) 4 Ω 2 4 Ω ω(cut(c, C )) 2 ζ(c) ζ(c ), ζ(c C ) = ζ(c) + ζ(c ).

74 44 Agglomerative greedy clustering heuristic max G 0 = (V 0, E 0, ω 0, ζ 0 ) i 0 C 0 {{v} v V } while V i > 1 do if mod(g, C i ) max then max mod(g, C i ) C best C i µ weighted match clusters(g i ) (π i, G i+1 ) coarsen(g i, µ) C i+1 {{v V (π i π 0 )(v) = u} u V i+1 } i i + 1 return C best

75 : clustering time for DIMACS graphs time (s) *10-7 E CUDA TBB time Number of graph edges E DIMACS categories: clustering/, coauthor/, streets/, random/, delaunay/, matrix/, walshaw/, dyn-frames/, and redistrict/. CUDA implementation with the Thrust template library and Intel TBB implementation. Web link graph uk-2002 with 0.26 billion vertices clustered in 30 s using Intel TBB. 45

76 46 DIMACS road networks and coauthor graphs G V E mod t mod t CU CU TBB TBB luxembourg 114, , belgium 1,441,295 1,549, netherlands 2,216,688 2,441, italy 6,686,493 7,013, great-britain 7,733,822 8,156, germany 11,548,845 12,369, asia 11,950,757 12,711, europe 50,912,018 54,054, coauthorscite 227, , coauthorsdblp 299, , citationcite 268,495 1,156, copapersdblp 540,486 15,245, copaperscite 434,102 16,036, mod = modularity, t = time in s, CU = CUDA

77 47 s BSP is extremely suitable for parallel graph computations: no worries about communication because we buffer messages until the next synchronisation; no send-receive pairs, but one-sided put or get operations; BSP cost model gives synchronisation frequency; correctness proof of algorithm becomes simpler; no deadlock possible. can be the basis for clustering, as demonstrated for GPUs and multicore CPUs. We clustered Asia s road network with 12M vertices and 12.7M edges in 2.7 seconds on a GPU.

Big graphs for big data: parallel matching and clustering on billion-vertex graphs

Big graphs for big data: parallel matching and clustering on billion-vertex graphs 1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Mostofa

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

More information

Parallel Computing for Data Science

Parallel Computing for Data Science Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,

More information

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

graph analysis beyond linear algebra

graph analysis beyond linear algebra graph analysis beyond linear algebra E. Jason Riedy DMML, 24 October 2015 HPC Lab, School of Computational Science and Engineering Georgia Institute of Technology motivation and applications (insert prefix

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Course on Social Network Analysis Graphs and Networks

Course on Social Network Analysis Graphs and Networks Course on Social Network Analysis Graphs and Networks Vladimir Batagelj University of Ljubljana Slovenia V. Batagelj: Social Network Analysis / Graphs and Networks 1 Outline 1 Graph...............................

More information

Big Data Graph Algorithms

Big Data Graph Algorithms Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Graph Security Testing

Graph Security Testing JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 23 No. 1 (2015), pp. 29-45 Graph Security Testing Tomasz Gieniusz 1, Robert Lewoń 1, Michał Małafiejski 1 1 Gdańsk University of Technology, Poland Department of

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

FAST LOCAL SEARCH FOR THE MAXIMUM INDEPENDENT SET PROBLEM

FAST LOCAL SEARCH FOR THE MAXIMUM INDEPENDENT SET PROBLEM FAST LOCAL SEARCH FOR THE MAXIMUM INDEPENDENT SET PROBLEM DIOGO V. ANDRADE, MAURICIO G.C. RESENDE, AND RENATO F. WERNECK Abstract. Given a graph G = (V, E), the independent set problem is that of finding

More information

Analysis of Algorithms, I

Analysis of Algorithms, I Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Apache Hama Design Document v0.6

Apache Hama Design Document v0.6 Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault

More information

Union-Find Algorithms. network connectivity quick find quick union improvements applications

Union-Find Algorithms. network connectivity quick find quick union improvements applications Union-Find Algorithms network connectivity quick find quick union improvements applications 1 Subtext of today s lecture (and this course) Steps to developing a usable algorithm. Define the problem. Find

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction

More information

BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1) Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

More information

http://www.wordle.net/

http://www.wordle.net/ Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely

More information

BOUNDARY EDGE DOMINATION IN GRAPHS

BOUNDARY EDGE DOMINATION IN GRAPHS BULLETIN OF THE INTERNATIONAL MATHEMATICAL VIRTUAL INSTITUTE ISSN (p) 0-4874, ISSN (o) 0-4955 www.imvibl.org /JOURNALS / BULLETIN Vol. 5(015), 197-04 Former BULLETIN OF THE SOCIETY OF MATHEMATICIANS BANJA

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

STINGER: High Performance Data Structure for Streaming Graphs

STINGER: High Performance Data Structure for Streaming Graphs STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on

More information

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN

More information

Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06

Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06 CS880: Approximations Algorithms Scribe: Matt Elder Lecturer: Shuchi Chawla Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06 3.1 Set Cover The Set Cover problem is: Given a set of

More information

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

each college c i C has a capacity q i - the maximum number of students it will admit

each college c i C has a capacity q i - the maximum number of students it will admit n colleges in a set C, m applicants in a set A, where m is much larger than n. each college c i C has a capacity q i - the maximum number of students it will admit each college c i has a strict order i

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser

More information

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold Zachary Monaco Georgia College Olympic Coloring: Go For The Gold Coloring the vertices or edges of a graph leads to a variety of interesting applications in graph theory These applications include various

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Guessing Game: NP-Complete?

Guessing Game: NP-Complete? Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

GRAPH data are important data types in many scientific

GRAPH data are important data types in many scientific Limited Random Walk Algorithm for Big Graph Data Clustering Honglei Zhang, Member, IEEE, Jenni Raitoharju, Member, IEEE, Serkan Kiranyaz, Senior Member, IEEE, and Moncef Gabbouj, Fellow, IEEE arxiv:66.645v

More information

Institut für Informatik Lehrstuhl Theoretische Informatik I / Komplexitätstheorie. An Iterative Compression Algorithm for Vertex Cover

Institut für Informatik Lehrstuhl Theoretische Informatik I / Komplexitätstheorie. An Iterative Compression Algorithm for Vertex Cover Friedrich-Schiller-Universität Jena Institut für Informatik Lehrstuhl Theoretische Informatik I / Komplexitätstheorie Studienarbeit An Iterative Compression Algorithm for Vertex Cover von Thomas Peiselt

More information

Distributed Load Balancing for Machines Fully Heterogeneous

Distributed Load Balancing for Machines Fully Heterogeneous Internship Report 2 nd of June - 22 th of August 2014 Distributed Load Balancing for Machines Fully Heterogeneous Nathanaël Cheriere nathanael.cheriere@ens-rennes.fr ENS Rennes Academic Year 2013-2014

More information

Large induced subgraphs with all degrees odd

Large induced subgraphs with all degrees odd Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Permutation Betting Markets: Singleton Betting with Extra Information

Permutation Betting Markets: Singleton Betting with Extra Information Permutation Betting Markets: Singleton Betting with Extra Information Mohammad Ghodsi Sharif University of Technology ghodsi@sharif.edu Hamid Mahini Sharif University of Technology mahini@ce.sharif.edu

More information

Algorithmic Methods for Complex Network Analysis: Graph Clustering

Algorithmic Methods for Complex Network Analysis: Graph Clustering Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL

More information

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes. 1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The

More information

Solutions to Homework 6

Solutions to Homework 6 Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example

More information

Triangle deletion. Ernie Croot. February 3, 2010

Triangle deletion. Ernie Croot. February 3, 2010 Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

GTC 2014 San Jose, California

GTC 2014 San Jose, California GTC 2014 San Jose, California An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management Yigal Jhirad and Blay Tarnoff March 26, 2014 GTC 2014: Table of Contents

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

On the independence number of graphs with maximum degree 3

On the independence number of graphs with maximum degree 3 On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs

More information

System Modeling Introduction Rugby Meta-Model Finite State Machines Petri Nets Untimed Model of Computation Synchronous Model of Computation Timed Model of Computation Integration of Computational Models

More information

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis Pelle Jakovits Outline Problem statement State of the art Approach Solutions and contributions Current work Conclusions

More information

Graphical degree sequences and realizations

Graphical degree sequences and realizations swap Graphical and realizations Péter L. Erdös Alfréd Rényi Institute of Mathematics Hungarian Academy of Sciences MAPCON 12 MPIPKS - Dresden, May 15, 2012 swap Graphical and realizations Péter L. Erdös

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

Class One: Degree Sequences

Class One: Degree Sequences Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs

Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs Anand Bhalgat Ramesh Hariharan Telikepalli Kavitha Debmalya Panigrahi Abstract Given an unweighted undirected or directed

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret. Programming project Task Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret. Obtaining data This project focuses upon cocktail ingredients. Data was

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs Online survey for collective clustering of computer generated architectural floor plans Conference

More information

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which

More information

The tipping point is often referred to in popular literature. First introduced in Schelling 78 and Granovetter 78

The tipping point is often referred to in popular literature. First introduced in Schelling 78 and Granovetter 78 Paulo Shakarian and Damon Paulo Dept. Electrical Engineering and Computer Science and Network Science Center U.S. Military Academy West Point, NY ASONAM 2012 The tipping point is often referred to in popular

More information

High degree graphs contain large-star factors

High degree graphs contain large-star factors High degree graphs contain large-star factors Dedicated to László Lovász, for his 60th birthday Noga Alon Nicholas Wormald Abstract We show that any finite simple graph with minimum degree d contains a

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Accelerating In-Memory Graph Database traversal using GPGPUS

Accelerating In-Memory Graph Database traversal using GPGPUS Accelerating In-Memory Graph Database traversal using GPGPUS Ashwin Raghav Mohan Ganesh University of Virginia High Performance Computing Laboratory am2qa@virginia.edu Abstract The paper aims to provide

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Scheduling Algorithm for Delivery and Collection System

Scheduling Algorithm for Delivery and Collection System Scheduling Algorithm for Delivery and Collection System Kanwal Prakash Singh Data Scientist, DSL, Housing.com Abstract Extreme teams, large-scale agents teams operating in dynamic environments are quite

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information