Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Similar documents
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Big Data Analytics. Lucas Rego Drumond

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

An Experimental Comparison of Pregel-like Graph Processing Systems

Evaluating partitioning of big graphs

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits

LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.

Systems and Algorithms for Big Data Analytics

Apache Hama Design Document v0.6

Replication-based Fault-tolerance for Large-scale Graph Processing

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Optimizations and Analysis of BSP Graph Processing Models on Public Clouds

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction

Big Graph Processing: Some Background

Graph Processing and Social Networks

Machine Learning over Big Data

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

Large-Scale Data Processing

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Minimize Response Time Using Distance Based Load Balancer Selection Scheme

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Information Processing, Big Data, and the Cloud

Large Scale Graph Processing with Apache Giraph

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Scaling Out With Apache Spark. DTL Meeting Slides based on

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics

Survey on Load Rebalancing for Distributed File System in Cloud

MOSIX: High performance Linux farm

Presto/Blockus: Towards Scalable R Data Analysis

EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS

Big Data and Scripting Systems beyond Hadoop

Implementing Graph Pattern Mining for Big Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Big Data and Scripting Systems build on top of Hadoop

Big Data Analytics. Lucas Rego Drumond

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Big Systems, Big Data

Performance Evaluation of Improved Web Search Algorithms

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Parallel Computing. Benson Muite. benson.

Outline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing 2/28/2013

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

Overlapping Data Transfer With Application Execution on Clusters

An NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013

- Behind The Cloud -

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Systems CS 5965/6965 FALL 2015

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Energy Efficient MapReduce

A Comparison of General Approaches to Multiprocessor Scheduling

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

FPGA-based Multithreading for In-Memory Hash Joins

Performance Monitoring of Parallel Scientific Applications

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

A Brief Introduction to Apache Tez

Big Data looks Tiny from the Stratosphere

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Characterizing Task Usage Shapes in Google s Compute Clusters

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Layer Load Balancing and Flexibility

Challenges for Data Driven Systems

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

Performance And Scalability In Oracle9i And SQL Server 2000

ORACLE DATABASE 10G ENTERPRISE EDITION

Architectures for Big Data Analytics A database perspective

GraySort on Apache Spark by Databricks

General purpose Distributed Computing using a High level Language. Michael Isard

MapGraph. A High Level API for Fast Development of High Performance Graphic Analytics on GPUs.

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Efficient Monitoring in Actor-based Mobile Hybrid Cloud Framework. Kirill Mechitov, Reza Shiftehfar, and Gul Agha

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

Big Fast Data Hadoop acceleration with Flash. June 2013

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

Spark and the Big Data Library

Data Processing in the Era of Big Data

Unified Big Data Analytics Pipeline. 连 城

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

Using Map-Reduce for Large Scale Analysis of Graph-Based Data

Distributed communication-aware load balancing with TreeMatch in Charm++

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Graph Databases. Prad Nelluru, Bharat Naik, Evan Liu, Bon Koo

A Comparison of Approaches to Large-Scale Data Analysis

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Unified Big Data Processing with Apache Spark. Matei

InfiniteGraph: The Distributed Graph Database

From GWS to MapReduce: Google s Cloud Technology in the Early Days

Map-Reduce for Machine Learning on Multicore

Dependency Free Distributed Database Caching for Web Applications and Web Services

A Performance Analysis of Distributed Indexing using Terrier

The Power of Relationships

Introduction to DISC and Hadoop

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Transcription:

/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of Science and Technology Thuwal, Saudi Arabia 2 IBM Watson Research Center Yorktown Heights, NY

/35 Importance of Graphs Graphs abstract application-specific algorithms into generic problems represented as interactions using vertices and edges Max flow in road network Diameter of WWW Ranking in social networks Simulating protein interactions Different applications vary in their computation requirements

3/35 How to Scale Graph Processing Pregel was introduced by Google as a scalable abstraction for graph processing Overcomes the limitations of processing graphs on MapReduce Based on vertex-centric computation Utilizes bulk synchronous parallel (BSP) System Programming Data Computations Abstraction Exchange map() Key-based MapReduce combine() grouping Disk-based reduce() compute() Message Pregel combine() passing In memory aggregate()

4/35 Pregel s BSP Graph Processing Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 BSP Barrier Balanced computation and communication is fundamental to Pregel s efficiency

5/35 Optimizing Graph Processing Existing work focus on optimizing for graph structure (static optimization): Optimize graph partitioning: Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer Optimize graph data access: Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity

5/35 Optimizing Graph Processing Existing work focus on optimizing for graph structure (static optimization): Optimize graph partitioning: Simple graph partitioning schemes (e.g., hash or range): Giraph User-defined partitioning function: Pregel Sophisticated partitioning techniques (e.g., min-cuts): GraphLab, PowerGraph, GoldenOrb and Surfer Optimize graph data access: Distributed data stores and graph indexing: GoldenOrb, Hama and Trinity What about algorithm behavior? Pregel provides coarse-grained load balancing, is it enough?

6/35 Types of Algorithms Algorithms behaves differently, we classify algorithms in two categories depending on their behavior Algorithm Type In/Out Vertices Examples Messages State PageRank, Stationary Predictable Fixed Diameter estimation, WCC Distributed minimum spanning tree, Non-stationary Variable Variable Belief propagation, Graph queries, Ad propagation

7/35 Types of Algorithms First Superstep V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST

8/35 Types of Algorithms k Superstep V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST

9/35 Types of Algorithms k + m Superstep V2 V8 V2 V8 V4 V1 V6 V4 V1 V6 V7 V3 V5 V7 V3 V5 PageRank DMST

10/35 What Causes Computation Imbalance in Non-stationary Algorithms? Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 BSP Barrier -Vertex response time -Time to receive in messages -Time to send out messages

11/35 Why to Optimize for Algorithm Behavior? Difference between stationary and non-stationary algorithms In Messages (Millions) 1000 100 10 1 0.1 0.01 0.001 0 10 20 30 40 50 60 SuperSteps PageRank - Total PageRank - Max/W DMST - Total DMST - Max/W

12/35 What is Mizan? Open source BSP-based graph processing Provides runtime load balancing through fine grain vertex migrations Follows Pregel model

3/35 PageRank s compute() in Mizan void compute(messageiterator * messages, uservertexobject * data, messagemanager * comm) { double currval = data->getvertexvalue().getvalue(); double newval = 0; double c = 0.85; if (data->getcurrentss() > 1) { while (messages->hasnext()) { newval = newval + messages->getnext().getvalue(); } newval = newval * c + (1.0 - c) / ((double) vertextotal); data->setvertexvalue(mdouble(newval)); } else {newval = currval;} if (data->getcurrentss() <= maxsuperstep) { mdouble outval(newval / ((double) data->getoutedgecount())); for (int i = 0; i < data->getoutedgecount(); i++) { comm->sendmessage(data->getoutedgeid(i), outval); } } else {data->votetohalt();} }

14/35 Migration Planning Objectives Decentralized Simple Transparent No need to change Pregel s API Does not assume any a priori knowledge to graph structure or algorithm

15/35 Mizan s Migration Barrier Mizan performs both planning and migrations after all workers reach the BSP barrier Superstep 1 Superstep 2 Superstep 3 Worker 1 Worker 1 Worker 1 Worker 2 Worker 2 Worker 2 Worker 3 Worker 3 Worker 3 Migration Barrier BSP Barrier Migration Planner

16/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance: By comparing the worker s execution time against a normal distribution and flagging outliers

16/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance: By comparing the worker s execution time against a normal distribution and flagging outliers Mizan monitors for each vertex: Remote outgoing messages All incoming messages Response time High level summaries are broadcast to each worker Vertex Response Time Remote Outgoing Messages V4 V5 V6 V2 V3 V1 Mizan Worker 1 Local Incoming Messages Mizan Worker 2 Remote Incoming Messages

17/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective: Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker s execution time

17/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective: Mizan finds the strongest cause of workload imbalance by comparing statistics for outgoing messages and incoming messages of all workers with the worker s execution time The migration objective is either: Optimize for outgoing messages, or Optimize for incoming messages, or Optimize for response time

18/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones: All workers create and execute the migration plan in parallel without centralized coordination Each worker is paired with one other worker at most W7 W2 W1 W5 W8 W4 W0 W6 W9 W3 0 1 2 3 4 5 6 7 8

19/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate: Depending on the migration objective

20/35 Mizan s Migration Planning Steps 1 Identify the source of imbalance 2 Select the migration objective 3 Pair over-utilized workers with under-utilized ones 4 Select vertices to migrate 5 Migrate vertices: How to migrate vertices freely across workers while maintaining vertex ownership and fast updates? How to minimize migration costs for large vertices?

21/35 Vertex Ownership Mizan uses distributed hash table (DHT) to implement a distributed lookup service: V s home worker ID = (hash(id) mod N) V can execute at any worker Workers ask the home worker of V for its current location The home worker is notified with the new location as V migrates Where is V5 & V8 Compute V1 V3 V2 V5 V8 V4 V2 V5 V6 V9 V7 V1 V3 V9 V8 DHT V1:W1 V4:W2 V7:W3 V2:W1 V5:W2 V8:W3 V3:W1 V6:W2 V9:W3 Worker 1 Worker 2 Worker 3

22/35 Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to worker new Superstep t Superstep t+1 Superstep t+2 Worker_old Worker_old Worker_old 1 3 1 3 1 3 2 7 2 7 2 4 4 '7 4 7 5 6 Worker_new 5 6 Worker_new 5 6 Worker_new

22/35 Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to worker new At SS t + 1: worker new receives vertex 7 messages worker old process vertex 7 Superstep t Superstep t+1 Worker_old Worker_old 1 3 1 3 Worker_old 1 3 2 7 2 7 2 4 4 '7 4 7 5 6 Worker_new 5 6 Worker_new 5 6 Worker_new

22/35 Migrating Vertices with Large Message Size Introduce delayed migration for very large vertices: At SS t: only ownership of vertex v is moved to worker new At SS t + 1: worker new receives vertex 7 messages worker old process vertex 7 After SS t + 1: worker old moves state of vertex 7 to worker new Superstep t Superstep t+1 Superstep t+2 Worker_old Worker_old Worker_old 1 3 1 3 1 3 2 7 2 7 2 4 4 '7 4 7 5 6 Worker_new 5 6 Worker_new 5 6 Worker_new

23/35 Experimental Setup Mizan is implemented on C++ with MPI with three variations: Static Mizan: Emulates Giraph, disables dynamic migration Work stealing (WS): Emulates Pregel s coarse-grained dynamic load balancing Mizan: Activates dynamic migration Local cluster of 21 machines: Mix of i5 and i7 processors with 16GB RAM Each IBM Blue Gene/P supercomputer with 1024 compute nodes: Each is a 4 core PowerPC450 CPU at 850MHz with 4GB RAM

24/35 Datasets Experiments on public datasets: Stanford Network Analysis Project (SNAP) The Laboratory for Web Algorithmics (LAW) Kronecker generator name nodes edges kg1 (synthetic) 1,048,576 5,360,368 kg4m68m (synthetic) 4,194,304 68,671,566 web-google 875,713 5,105,039 LiveJournal1 4,847,571 68,993,773 hollywood-2011 2,180,759 228,985,632 arabic-2005 22,744,080 639,999,458

25/35 Giraph vs. Static Mizan Run Time (Min) 40 35 30 25 20 15 10 5 0 Giraph Static Mizan kg1 web-google kg4m68m LiveJournal1 Run Time (Min) 12 10 8 6 4 2 0 Giraph Static Mizan 1 2 4 8 16 Vertex count (Millions) Figure : PageRank on social network and random graphs Figure : PageRank on regular random graphs, each has around 17M edge

26/35 Effectiveness of Dynamic Vertex Migration PageRank on a social graph (LiveJournal1) The shaded columns: algorithm runtime The unshaded columns: initial partitioning cost 30 Run Time (Min) 25 20 15 10 5 0 Static WS Mizan Static WS Mizan Static WS Mizan Hash Range Metis

27/35 Effectiveness of Dynamic Vertex Migration Comparing the performance on highly variable messaging pattern algorithms, which are DMST and ad propagation simulation, on a metis partitioned social graph (LiveJournal1) Run Time (Min) 300 250 200 150 100 50 Static Work Stealing Mizan 0 DMST Advertisment

28/35 Mizan s Overhead with Scaling Speedup 8 Mizan - hollywood-2011 7 6 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 Compute Nodes Figure : Speedup on Linux Cluster Speedup 10 8 6 4 2 0 64 128 Mizan - Arabic-2005 256 512 Compute Nodes Figure : Speedup on IBM Blue Gene/P supercomputer 1024

29/35 Future Work Work with skewed graphs Improving Pregel s fault tolerance

30/35 Conclusion Mizan is a Pregel system that uses fine-grained vertex migration to load balance computation and communication across supersteps Mizan is an open source project developed within InfoCloud group in KAUST in collaboration with IBM, programmed in C++ with MPI Mizan scales up to thousands of workers Mizan improves the overall computation cost between 40% up to two order of magnitudes with less than 10% migration overhead Download it at: http://cloud.kaust.edu.sa Try it on EC2 (ami-52ed743b)

31/35 Extra: Migration Details and Costs Run Time (Minutes) 4 3.5 3 2.5 2 1.5 1 0.5 0 Superstep Runtime Average Workers Runtime Migration Cost Balance Outgoing Messages Balance Incoming Messages Balance Response Time 5 10 15 20 25 30 Supersteps

32/35 Extra: Migrated Vertices per Superstep Migrated Vertices (x1000) 500 400 300 200 100 0 Migration Cost Total Migrated Vertices Max Vertices Migrated by Single Worker 0 5 10 15 20 25 30 4 3 2 1 Migration Cost (Minutes) Supersteps

33/35 Extra: Mizan s Migration Planning Compute() Migration Barrier No BSP Barrier Superstep Imbalanced? Yes Identify Source of Imbalance Pair with Underutilized Worker Select Vertices and Migrate No Worker Overutilized? Yes

34/35 Extra: Architecture of Mizan Mizan Worker Vertex Compute() Migration Planner BSP Processor Communicator - DHT Storage Manager IO HDFS/Local Disks