Filtering: A Method for Solving Graph Problems in MapReduce

Size: px
Start display at page:

Download "Filtering: A Method for Solving Graph Problems in MapReduce"

Transcription

1 Filtering: A Method for Solving Graph Problems in MapReduce Benjamin Moseley Silvio Lattanzi Siddharth Suri Sergei Vassilvitski UIUC Google Yahoo! Yahoo!

2 Overview Introduction to MapReduce model Our settings Our results Open questions

3 Introduction to the MapReduce model

4 XXL Data Huge amount of data Main problem is to analyze information quickly

5 XXL Data Huge amount of data Main problem is to analyze information quickly New tools Suitable efficient algorithms

6 MapReduce MapReduce is the platform of choice for processing massive data INPUT MAP PHASE REDUCE PHASE

7 MapReduce MapReduce is the platform of choice for processing massive data Data are represented as tuple INPUT < key, value > MAP PHASE REDUCE PHASE

8 MapReduce MapReduce is the platform of choice for processing massive data Data are represented as tuple INPUT < key, value > Mapper decides how data is distributed MAP PHASE REDUCE PHASE

9 MapReduce MapReduce is the platform of choice for processing massive data Data are represented as tuple INPUT < key, value > Mapper decides how data is distributed Reducer performs non-trivial computation locally MAP PHASE REDUCE PHASE

10 How can we model MapReduce? PRAM No limit on the number of processors Memory is uniformly accessible from any processor No limit on the memory available

11 How can we model MapReduce? STREAMING Just one processor There is a limited amount of memory No parallelization

12 MapReduce model [Karloff, Suri and Vassilvitskii] N is the input size and > 0 is some fixed constant

13 MapReduce model [Karloff, Suri and Vassilvitskii] N > 0 is the input size and is some fixed constant Less than N 1 machines

14 MapReduce model [Karloff, Suri and Vassilvitskii] N > 0 is the input size and is some fixed constant Less than Less than N 1 N 1 machines memory on each machine

15 MapReduce model [Karloff, Suri and Vassilvitskii] N > 0 is the input size and is some fixed constant Less than Less than N 1 N 1 machines memory on each machine MRC i : problem that can be solved in O(log i N) rounds

16 Combining map and reduce phase Mapper and Reducer work only on a subgraph

17 Combining map and reduce phase Mapper and Reducer work only on a subgraph Keep time in each round polynomial

18 Combining map and reduce phase Mapper and Reducer work only on a subgraph Keep time in each round polynomial Time is constrained by the number of rounds

19 Algorithmic challenges No machine can see the entire input

20 Algorithmic challenges No machine can see the entire input No communication between machines during each phase

21 Algorithmic challenges No machine can see the entire input No communication between machines during each phase Total memory is N

22 MapReduce vs MUD algorithms In MUD framework each reducer operates on a stream of data. In MUD, each reducer is restricted to only using polylogarithmic space.

23 Our settings

24 Our settings We study the Karloff, Suri and Vassilvitskii model

25 Our settings We study the Karloff, Suri and Vassilvitskii model We focus on class MRC 0

26 Our settings m = n 1+c We assume to work with dense graph, for some constant c>0

27 Our settings m = n 1+c We assume to work with dense graph, for some constant c>0 Empirical evidences that social networks are dense graphs [Leskovec, Kleinberg and Faloutsos]

28 Dense graph motivation [Leskovec, Kleinberg and Faloutsos] They study 9 different social networks

29 Dense graph motivation [Leskovec, Kleinberg and Faloutsos] They study 9 different social networks They show that several graphs from different domains have n 1+c edges

30 Dense graph motivation [Leskovec, Kleinberg and Faloutsos] They study 9 different social networks They show that several graphs from different domains have n 1+c edges Lowest value of c founded.08 and four graphs have c>.5

31 Our results

32 Results Constant rounds algorithms for Maximal matching Minimum cut 8-approx for maximum weighted matching -approx for vertex cover 3/-approx for edge cover

33 Notation G =(V,E) : Input graph n : number of nodes m : number of edges η : memory available on each machine N : input size

34 Filtering Part of the input is dropped or filtered on the first stage in parallel

35 Filtering Part of the input is dropped or filtered on the first stage in parallel Next some computation is performed on the filtered input

36 Filtering Part of the input is dropped or filtered on the first stage in parallel Next some computation is performed on the filtered input Finally some patchwork is done to ensure a proper solution

37 Filtering

38 Filtering FILTER

39 Filtering FILTER COMPUTE SOLUTION

40 Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH

41 Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH

42 Warm-up: Compute the minimum spanning tree m = n 1+c

43 Warm-up: Compute the minimum spanning tree m = n 1+c PARTITION EDGES n c/

44 Warm-up: Compute the minimum spanning tree m = n 1+c PARTITION EDGES n c/ n 1+c/ edges

45 Warm-up: Compute the minimum spanning tree n 1+c/ edges

46 Warm-up: Compute the minimum spanning tree n 1+c/ edges COMPUTE MST n c/

47 Warm-up: Compute the minimum spanning tree n 1+c/ edges COMPUTE MST n c/ SECOND ROUND n c/ n 1+c/ edges

48 Show that the algorithm is in MRC 0 The algorithm is correct

49 Show that the algorithm is in MRC 0 The algorithm is correct No edge in the final solution is discarded in partial solution

50 Show that the algorithm is in MRC 0 The algorithm is correct No edge in the final solution is discarded in partial solution The algorithm runs in two rounds

51 Show that the algorithm is in MRC 0 The algorithm is correct No edge in the final solution is discarded in partial solution The algorithm runs in two rounds No more than O n c machines are used

52 Show that the algorithm is in MRC 0 The algorithm is correct No edge in the final solution is discarded in partial solution The algorithm runs in two rounds No more than O n c machines are used No machine uses memory greater than O(n 1+c/ )

53 Maximal matching Algorithmic difficulty is that each machine can only see edges assigned to the machine

54 Maximal matching Algorithmic difficulty is that each machine can only see edges assigned to the machine Is a partitioning based algorithm feasible?

55 Partitioning algorithm

56 Partitioning algorithm PARTITIONING

57 Partitioning algorithm PARTITIONING COMBINE MATCHINGS

58 Partitioning algorithm PARTITIONING COMBINE MATCHINGS It is impossible to build a maximal matching using only red edges!!

59 Algorithmic insight Consider any subset of the edges E

60 Algorithmic insight Consider any subset of the edges E Let M G[E ] be a maximal matching on

61 Algorithmic insight Consider any subset of the edges E Let M G[E ] be a maximal matching on The unmatched vertices form a independent set

62 Algorithmic insight We pick each edge with probability p

63 Algorithmic insight We pick each edge with probability p Find a matching on a sample and then find a matching on unmatched vertices

64 Algorithmic insight We pick each edge with probability p Find a matching on a sample and then find a matching on unmatched vertices For dense portions of the graph, many edges should be sampled

65 Algorithmic insight We pick each edge with probability p Find a matching on a sample and then find a matching on unmatched vertices For dense portions of the graph, many edges should be sampled Sparse portions of the graph are small and can be placed on a single machine

66 Algorithm Sample each edge independently with probability 10 log n p = n c/

67 Algorithm Sample each edge independently with probability 10 log n p = n c/ Find a matching on the sample

68 Algorithm Sample each edge independently with probability 10 log n p = n c/ Find a matching on the sample Consider the induced subgraph on unmatched vertices

69 Algorithm Sample each edge independently with probability 10 log n p = n c/ Find a matching on the sample Consider the induced subgraph on unmatched vertices Find a matching on this graph and output the union

70 Algorithm

71 Algorithm SAMPLE

72 Algorithm SAMPLE FIRST MATCHING

73 Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES

74 Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES UNION

75 Correctness Consider the last step of the algorithm

76 Correctness Consider the last step of the algorithm All unmatched vertices are placed on a machine

77 Correctness Consider the last step of the algorithm All unmatched vertices are placed on a machine All unmatched vertices or are matched at the last step or have only matched neighbors

78 Bounding the rounds Three rounds: Sampling the edges

79 Bounding the rounds Three rounds: Sampling the edges Find a matching on a single machine for the sampled edges

80 Bounding the rounds Three rounds: Sampling the edges Find a matching on a single machine for the sampled edges Find a matching for the unmatched vertices

81 Bounding the memory Each edge was sampled with probability p = 10 log n n c/

82 Bounding the memory Each edge was sampled with probability p = 10 log n n c/ Using Chernoff the sampled graph has size with high probability Õ(n 1+c/ )

83 Bounding the memory Each edge was sampled with probability p = 10 log n n c/ Using Chernoff the sampled graph has size with high probability Õ(n 1+c/ ) Can we bound the size of the induced subgraph on the unmatched vertices?

84 Bounding the memory For a fixed induced subgraph with at least edges the probability an edge is not sampled is: 1 10 log n n c/ n 1+c/ exp( 10n log n) n 1+c/

85 Bounding the memory For a fixed induced subgraph with at least edges the probability an edge is not sampled is: 1 10 log n n c/ n 1+c/ exp( 10n log n) n 1+c/ Union bounding over all n induced subgraphs shows that at least one edge is sampled from every dense subgraph with probability 1 exp( 10 log n) 1 1 n 10

86 Using less memory Can we run the algorithm with less then memory? Õ(n 1+c/ )

87 Using less memory Can we run the algorithm with less then memory? Õ(n 1+c/ ) We can amplify our sampling technique!

88 Using less memory Sample as many edges as possible that fits in memory

89 Using less memory Sample as many edges as possible that fits in memory Find a matching

90 Using less memory Sample as many edges as possible that fits in memory Find a matching If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices

91 Using less memory Sample as many edges as possible that fits in memory Find a matching If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices Otherwise, recurse on the unmatched nodes

92 Using less memory Sample as many edges as possible that fits in memory Find a matching If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices Otherwise, recurse on the unmatched nodes n 1+ With memory each iteration a factor of edges are removed resulting in rounds O(c/) n

93 Parallel computation power Maximal matching algorithm does not use parallelization

94 Parallel computation power Maximal matching algorithm does not use parallelization We use a single machine in every step

95 Parallel computation power Maximal matching algorithm does not use parallelization We use a single machine in every step When parallelization is used?

96 Maximum weighted matching

97 Maximum weighted matching SPLIT THE GRAPH

98 Maximum weighted matching

99 Maximum weighted matching MATCHINGS 7 4 3

100 Maximum weighted matching 7 4 3

101 Maximum weighted matching SCAN THE EDGE SEQUENTIALLY 7 4 3

102 Bounding the rounds and memory Four rounds: Splitting the graph and running the maximal matching: 3 rounds and memory Õ(n 1+c/ ) Compute the final solution: 1 round and memory O(n log n)

103 Approximation guarantee (sketch) An edge in the solution can block at most edges in each subgraph of smaller weight

104 Approximation guarantee (sketch) An edge in the solution can block at most edges in each subgraph of smaller weight We loose a factor or because we do not consider the weight

105 Other algorithms Based on the same intuition: -approx for vertex cover 3/-approx for edge cover

106 Minimum cut Partition does not work, because we loose structural informations

107 Minimum cut Partition does not work, because we loose structural informations Sampling does not seem to work either

108 Minimum cut Partition does not work, because we loose structural informations Sampling does not seem to work either We can use the first steps of Karger algorithm as a filtering technique

109 Minimum cut Partition does not work, because we loose structural informations Sampling does not seem to work either We can use the first steps of Karger algorithm as a filtering technique The random choices made in the early rounds succeed with high probability, whereas the later rounds have a much lower probability of success

110 Minimum cut 1

111 Minimum cut RANDOM WEIGHT n δ

112 Minimum cut RANDOM WEIGHT n δ COMPRESS EDGES WITH SMALL WEIGTH

113 Minimum cut RANDOM WEIGHT n δ COMPRESS EDGES WITH SMALL WEIGTH From Karger at least one graph will contain a min cut w.h.p.

114 Minimum cut

115 Minimum cut COMPUTE MINIMUM CUT We will find the minimum cut w.h.p.

116 Empirical result (matching) %#!" '()(**+*",*-.)/01" %!!" 30)+(/4-""!"#$%&'(%)#"*&+,' $#!" $!!" #!"!"!&!!$"!&!!#"!&!$"!&!%"!&!#"!&$" $" -.%/0&'134.4)0)*5'(/,'

117 Open problems

118 Open problems 1 Maximum matching Shortest path Dynamic programming

119 Open problems Algorithms for sparse graph Does connected components require more than rounds? Lower bounds

120 Thank you!

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

A Model of Computation for MapReduce

A Model of Computation for MapReduce A Model of Computation for MapReduce Howard Karloff Siddharth Suri Sergei Vassilvitskii Abstract In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information

http://www.wordle.net/

http://www.wordle.net/ Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely

More information

map/reduce connected components

map/reduce connected components 1, map/reduce connected components find connected components with analogous algorithm: map edges randomly to partitions (k subgraphs of n nodes) for each partition remove edges, so that only tree remains

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

arxiv:1412.2333v1 [cs.dc] 7 Dec 2014

arxiv:1412.2333v1 [cs.dc] 7 Dec 2014 Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

More information

Analyzing the Facebook graph?

Analyzing the Facebook graph? Logistics Big Data Algorithmic Introduction Prof. Yuval Shavitt Contact: shavitt@eng.tau.ac.il Final grade: 4 6 home assignments (will try to include programing assignments as well): 2% Exam 8% Big Data

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms CSCE 310J Data Structures & Algorithms P, NP, and NP-Complete Dr. Steve Goddard goddard@cse.unl.edu CSCE 310J Data Structures & Algorithms Giving credit where credit is due:» Most of the lecture notes

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split

More information

Statistical Learning Theory Meets Big Data

Statistical Learning Theory Meets Big Data Statistical Learning Theory Meets Big Data Randomized algorithms for frequent itemsets Eli Upfal Brown University Data, data, data In God we trust, all others (must) bring data Prof. W.E. Deming, Statistician,

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Minimum spanning tree based classification model for massive data with MapReduce implementation

Minimum spanning tree based classification model for massive data with MapReduce implementation 2010 IEEE International Conference on Data Mining Workshops Minimum spanning tree based classification model for massive data with MapReduce implementation Jin Chang, Jun Luo, Joshua Zhexue Huang, Shengzhong

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014 Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction

More information

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven) Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Network Design with Coverage Costs

Network Design with Coverage Costs Network Design with Coverage Costs Siddharth Barman 1 Shuchi Chawla 2 Seeun William Umboh 2 1 Caltech 2 University of Wisconsin-Madison APPROX-RANDOM 2014 Motivation Physical Flow vs Data Flow vs. Commodity

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter

More information

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

More information

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design Hongsik Choi and Hyeong-Ah Choi Department of Electrical Engineering and Computer Science George Washington University Washington,

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS. renren.com. Saturday, December 3, 11 Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1) Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Every tree contains a large induced subgraph with all degrees odd

Every tree contains a large induced subgraph with all degrees odd Every tree contains a large induced subgraph with all degrees odd A.J. Radcliffe Carnegie Mellon University, Pittsburgh, PA A.D. Scott Department of Pure Mathematics and Mathematical Statistics University

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction

More information

Problem Set 7 Solutions

Problem Set 7 Solutions 8 8 Introduction to Algorithms May 7, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Handout 25 Problem Set 7 Solutions This problem set is due in

More information

Distributed Structured Prediction for Big Data

Distributed Structured Prediction for Big Data Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

Privacy-Preserving Big Data Publishing

Privacy-Preserving Big Data Publishing Privacy-Preserving Big Data Publishing Hessam Zakerzadeh 1, Charu C. Aggarwal 2, Ken Barker 1 SSDBM 15 1 University of Calgary, Canada 2 IBM TJ Watson, USA Data Publishing OECD * declaration on access

More information

All trees contain a large induced subgraph having all degrees 1 (mod k)

All trees contain a large induced subgraph having all degrees 1 (mod k) All trees contain a large induced subgraph having all degrees 1 (mod k) David M. Berman, A.J. Radcliffe, A.D. Scott, Hong Wang, and Larry Wargo *Department of Mathematics University of New Orleans New

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing By: Chris Abbott Introduction What is penetration testing? Methodology for assessing network security, by generating and executing

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

CSC2420 Spring 2015: Lecture 3

CSC2420 Spring 2015: Lecture 3 CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda

More information

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,

More information

THE PROBLEM WORMS (1) WORMS (2) THE PROBLEM OF WORM PROPAGATION/PREVENTION THE MINIMUM VERTEX COVER PROBLEM

THE PROBLEM WORMS (1) WORMS (2) THE PROBLEM OF WORM PROPAGATION/PREVENTION THE MINIMUM VERTEX COVER PROBLEM 1 THE PROBLEM OF WORM PROPAGATION/PREVENTION I.E. THE MINIMUM VERTEX COVER PROBLEM Prof. Tiziana Calamoneri Network Algorithms A.y. 2014/15 2 THE PROBLEM WORMS (1)! A computer worm is a standalone malware

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of

More information

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Analysis of Computer Algorithms. Algorithm. Algorithm, Data Structure, Program

Analysis of Computer Algorithms. Algorithm. Algorithm, Data Structure, Program Analysis of Computer Algorithms Hiroaki Kobayashi Input Algorithm Output 12/13/02 Algorithm Theory 1 Algorithm, Data Structure, Program Algorithm Well-defined, a finite step-by-step computational procedure

More information

The Classes P and NP. mohamed@elwakil.net

The Classes P and NP. mohamed@elwakil.net Intractable Problems The Classes P and NP Mohamed M. El Wakil mohamed@elwakil.net 1 Agenda 1. What is a problem? 2. Decidable or not? 3. The P class 4. The NP Class 5. TheNP Complete class 2 What is a

More information

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

More information

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010 CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected

More information

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that

More information

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case. Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

More information

Guessing Game: NP-Complete?

Guessing Game: NP-Complete? Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

SIMS 255 Foundations of Software Design. Complexity and NP-completeness SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity

More information

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective 2014-2015

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective 2014-2015 Cloud Computing Lectures 10 and 11 Map Reduce: System Perspective 2014-2015 1 MapReduce in More Detail 2 Master (i) Execution is controlled by the master process: Input data are split into 64MB blocks.

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

An Approximation Algorithm for Bounded Degree Deletion

An Approximation Algorithm for Bounded Degree Deletion An Approximation Algorithm for Bounded Degree Deletion Tomáš Ebenlendr Petr Kolman Jiří Sgall Abstract Bounded Degree Deletion is the following generalization of Vertex Cover. Given an undirected graph

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Lecture 22: November 10

Lecture 22: November 10 CS271 Randomness & Computation Fall 2011 Lecture 22: November 10 Lecturer: Alistair Sinclair Based on scribe notes by Rafael Frongillo Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard

More information

Private Approximation of Clustering and Vertex Cover

Private Approximation of Clustering and Vertex Cover Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, Ben-Gurion University of the Negev Abstract. Private approximation of search

More information

Big Data Begets Big Database Theory

Big Data Begets Big Database Theory Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

Steiner Tree Approximation via IRR. Randomized Rounding

Steiner Tree Approximation via IRR. Randomized Rounding Steiner Tree Approximation via Iterative Randomized Rounding Graduate Program in Logic, Algorithms and Computation μπλ Network Algorithms and Complexity June 18, 2013 Overview 1 Introduction Scope Related

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

More information

Discuss the size of the instance for the minimum spanning tree problem.

Discuss the size of the instance for the minimum spanning tree problem. 3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can

More information