Coflow! A Networking Abstraction For Cluster Applications! Mosharaf Chowdhury Ion Stoica! UC#Berkeley#



Similar documents
Cof low. Mending the Application-Network Gap in Big Data Analytics. Mosharaf Chowdhury. UC Berkeley

Advanced Computer Networks. Scheduling

Spark. Fast, Interactive, Language- Integrated Cluster Computing

From GWS to MapReduce: Google s Cloud Technology in the Early Days

MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears. Neeraj Ganapathy

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

ITG Software Engineering

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Developing MapReduce Programs

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Big Data and Scripting map/reduce in Hadoop

Data Structures and Performance for Scientific Computing with Hadoop and Dumbo

Software Defined Network Support for Real Distributed Systems

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

Apache Spark and Distributed Programming

A very short Intro to Hadoop

Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee

Chapter 7. Using Hadoop Cluster and MapReduce

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Networking in the Hadoop Cluster

Hadoop Architecture. Part 1

YARN and how MapReduce works in Hadoop By Alex Holmes

# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis

Big Data Analytics Hadoop and Spark

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Predictable Data Centers

Survey on Scheduling Algorithm in MapReduce Framework

Apache Hadoop. Alexandru Costan

H2O on Hadoop. September 30,

Big Data Processing with Google s MapReduce. Alexandru Costan

T. S. Eugene Ng Rice University

CSE-E5430 Scalable Cloud Computing Lecture 11

Fault Tolerance in Hadoop for Work Migration

Computer Networks. Introduc)on to Naming, Addressing, and Rou)ng. Week 09. College of Information Science and Engineering Ritsumeikan University

Open source Google-style large scale data analysis with Hadoop

How To Build A Policy Aware Switching Layer For Data Center Data Center Servers

Definition. A Historical Example

Hadoop Design and k-means Clustering

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Big Data and Scripting Systems beyond Hadoop

Hadoop Scheduler w i t h Deadline Constraint

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

As much fun as spectacle is, no one is impressed anymore. No one flinches. It is easy to get caught up in spectacle, but story is timeless.

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Big Data Analysis and Its Scheduling Policy Hadoop

A computational model for MapReduce job flow

Accelerating Cast Traffic Delivery in Data Centers Leveraging Physical Layer Optics and SDN

Hadoop Technology for Flow Analysis of the Internet Traffic

What s next for the Berkeley Data Analytics Stack?

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

Developing a MapReduce Application

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Distributed Computing and Big Data: Hadoop and MapReduce

Load Distribution in Large Scale Network Monitoring Infrastructures

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Improving MapReduce Performance in Heterogeneous Environments

How MapReduce Works 資碩一 戴睿宸

@Scalding. Based on talk by Oscar Boykin / Twitter

A Review on Quality of Service Architectures for Internet Network Service Provider (INSP)

How To Use Hadoop

Monitoring Load-Balancing Services

OpenFlow and Onix. OpenFlow: Enabling Innovation in Campus Networks. The Problem. We also want. How to run experiments in campus networks?


Managing large clusters resources

Data Intensive Computing Handout 6 Hadoop

Chapter 2 Addendum (More on Virtualization)

Facilitating Consistency Check between Specification and Implementation with MapReduce Framework

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Massive Cloud Auditing using Data Mining on Hadoop

Introduction to MapReduce and Hadoop

Data Center Networking with Multipath TCP

Stream Processing on GPUs Using Distributed Multimedia Middleware

Complete Java Classes Hadoop Syllabus Contact No:

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Processing Large Amounts of Images on Hadoop with OpenCV

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

Internals of Hadoop Application Framework and Distributed File System

Load Balancing in Data Center Networks

Internetworking and Internet-1. Global Addresses

Management & Analysis of Big Data in Zenith Team

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Big Data Too Big To Ignore

MapReduce. Tushar B. Kute,

Transcription:

Coflow A Networking Abstraction For Cluster Applications Mosharaf Chowdhury Ion Stoica UC#Berkeley#

Cluster Applications Multi-Stage Data Flows» Computation interleaved with communication Computation» Distributed» Runs on many machines Driver Communication» Structured» Between machine groups 2

Communication Abstraction A Flow» Sequence of packets» Independent» Often the unit for network scheduling, traffic engineering, load balancing etc. Multiple Parallel Flows» Independent» Yet, semantically bound» Shared objective Driver Minimize Completion Time 3

Coflow A collection of flows between two groups of machines that are bound together by applicationspecific semantics Captures 1. Structure 2. Shared Objective 3. Semantics 4

We Want To Better schedule the network» Intra-coflow» Inter-coflow Write the communication layer of a new application» Without reinventing the wheel Add unsupported coflows to an application, or Replace an existing coflow implementation» Independent of applications 5

Cluster Applications Coflow API The Network (Physically or Logically Centralized Controller) 6

Coflow API Goals 1. Separate intent from mechanisms 2. Convey application-specific semantics to the network 7

Coflow API terminate(handle) get(handle, id) content put(handle, id, content) Job finishes Shuffle finishes Driver create(shuffle) handle MapReduce 8

Coflow Flexibility shuffle reducers mappers Choice of algorithms» Default» WSS 1 Choice of mechanism» App vs. Network layer» Pull vs. Push 1. Orchestra, SIGCOMM 2011 9

Coflow Flexibility shuffle reducers @driver b " create(bcast) put(b, id, content) terminate(b) broadcast mappers driver (JobTracker) @mapper get(b, id) 10

Coflow Flexibility shuffle reducers @driver b " create(bcast) s " create(shuffle, ord=[b ~> s]) put(b, id, content) terminate(b) terminate(s) broadcast mappers driver (JobTracker) @mapper get(b, id) put(s, id s1 ) 11

Throughput-Sensitive Applications After 2 seconds Minimize Completion Time 12

Throughput-Sensitive Applications After 4 seconds After 7 seconds After 2 seconds Minimize Completion Time 13

Throughput-Sensitive Applications Free up resources without hurting application-perceived communication time After 7 seconds After 2 seconds Minimize Completion Time 14

Latency-Sensitive Applications HotNets 2012 Top-level Aggregator Mid-level Aggregators Workers 15

Latency-Sensitive Applications HotNets 2012 Top-level Aggregator Meet Deadline 1,2 Mid-level Aggregators Meet Deadline 1,2 Workers 1. D3, SIGCOMM 2011 2. PDQ, SIGCOMM 2012 16

Latency-Sensitive Applications HotNets 2012 HotNets-XI: Home Page conferences.sigcomm.org/hotnets/2012/ The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together people with interest in computer networks to engage in a lively debate... Top-level Aggregator HotNets Workshop acm sigcomm www.sigcomm.org/events/hotnets-workshop The Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative... HotNets-XI, Seattle, WA area, October 29-30, 2012. Mid-level Aggregators HotNets-XI: Call for Papers conferences.sigcomm.org/hotnets/2012/cfp.shtml The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together researchers in computer networks and systems to engage in a lively... Coflow accepted at HotNets'2012 www.mosharaf.com/blog/2012/09/.../coflow-accepted-at-hotnets201... Sep Workers 13, 2012 Update: Coflow camera-ready is available online Tell us what you think Our position paper to address the lack of a networking abstraction for... Meet Deadline 1,2 Meet Deadline 1,2 1. D3, SIGCOMM 2011 2. PDQ, SIGCOMM 2012 Limit impact to as few requests as possible 17

One More Thing 1. Critical Path Scheduling 2. OpenTCP 3. Structured Streams 4. 18

Coflow A semantically-bound collection of flows Conveys application intent to the network» Allows better management of network resources» Provides greater flexibility in designing applications Mosharaf Chowdhury http://www.mosharaf.com/ UC#Berkeley#