Decentralized Task-Aware Scheduling for Data Center Networks

Similar documents
Decentralized Task-Aware Scheduling for Data Center Networks

Friends, not Foes Synthesizing Existing Transport Strategies for Data Center Networks

Towards Predictable Datacenter Networks

Predictable Data Centers

Advanced Computer Networks. Scheduling

Load Balancing in Data Center Networks

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

Multipath TCP in Data Centres (work in progress)

Data Center Content Delivery Network

Influence of Load Balancing on Quality of Real Time Data Transmission*

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

On real-time delay monitoring in software-defined networks

W4118 Operating Systems. Instructor: Junfeng Yang

Chatty Tenants and the Cloud Network Sharing Problem

PACE Your Network: Fair and Controllable Multi- Tenant Data Center Networks

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Cof low. Mending the Application-Network Gap in Big Data Analytics. Mosharaf Chowdhury. UC Berkeley

phost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric

Silo: Predictable Message Completion Time in the Cloud

First Midterm for ECE374 02/25/15 Solution!!

Better Never than Late: Meeting Deadlines in Datacenter Networks

Windows Server Performance Monitoring

Network traffic: Scaling

How To Model A System

Lecture 7: Data Center Networks"

Towards Predictable Datacenter Networks

Empowering Software Defined Network Controller with Packet-Level Information

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs

Load Balancing Mechanisms in Data Center Networks

Cisco s Massively Scalable Data Center

Quality of Service using Traffic Engineering over MPLS: An Analysis. Praveen Bhaniramka, Wei Sun, Raj Jain

MMPTCP: A Novel Transport Protocol for Data Centre Networks

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

Storage I/O Control: Proportional Allocation of Shared Storage Resources

OpenFlow Based Load Balancing

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Testing & Assuring Mobile End User Experience Before Production. Neotys

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

OPERATING SYSTEMS SCHEDULING

Final Project Report. Trading Platform Server

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Operating Systems. Cloud Computing and Data Centers

ICTCP: Incast Congestion Control for TCP in Data Center Networks

LOAD BALANCING MECHANISMS IN DATA CENTER NETWORKS

A Review on Load Balancing In Cloud Computing 1

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

Switching Architectures for Cloud Network Designs

Load balancing; Termination detection

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

Multipath TCP design, and application to data centers. Damon Wischik, Mark Handley, Costin Raiciu, Christopher Pluntke

Improving our Evaluation of Transport Protocols. Sally Floyd Hamilton Institute July 29, 2005

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

TRILL Large Layer 2 Network Solution

Computer Networks COSC 6377

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks

CPU Scheduling Outline

Lecture 16: Quality of Service. CSE 123: Computer Networks Stefan Savage

Internet Content Distribution

Load balancing; Termination detection

Content Delivery Networks. Shaxun Chen April 21, 2009

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Small is Better: Avoiding Latency Traps in Virtualized Data Centers

6.6 Scheduling and Policing Mechanisms

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

Comparative Analysis of Congestion Control Algorithms Using ns-2

Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Final for ECE374 05/06/13 Solution!!

CPU Scheduling. Core Definitions

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

Homework 2 assignment for ECE374 Posted: 02/21/14 Due: 02/28/14

Advanced Computer Networks Project 2: File Transfer Application

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Center Network Topologies: FatTree

CS640: Introduction to Computer Networks. Why a New Service Model? Utility curve Elastic traffic. Aditya Akella. Lecture 20 QoS

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

Performance Workload Design

SiteCelerate white paper

Quality of Service (QoS) on Netgear switches

Deploying in a Distributed Environment

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

PCIe Storage Performance Testing Challenge

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

SIGCOMM Preview Session: Data Center Networking (DCN)

Quality of Service (QoS)) in IP networks

Optimization of AODV routing protocol in mobile ad-hoc network by introducing features of the protocol LBAR

Scalable Internet Services and Load Balancing

Data Networks Summer 2007 Homework #3

Transcription:

Decentralized Task-Aware Scheduling for Data Center Networks Fahad R. Dogar, Thomas Karagiannis, Hitesh Ballani, Ant Rowstron Presented by Eric Dong (yd2dong) October 30, 2015

Tasks in data centers Applications execute rich and complicated tasks Replying to search queries, gathering information for a news feed, etc Each task can involve dozensof flows, all of which have to complete for the task to finish

Tasks in data centers Two important metrics Task size: sum of the sizes of network flows involved All sorts of statistical distributions (ex: search vs. data analytics) Uniform, heavy-tailed, etc Flows per task Varies very wildly, from dozens to thousands. Scheduling algorithm must work on a wide

Traditional metrics Per-flow fair sharing (TCP, DCTCP) Poor average performance when multiple tasks occur at the same time Flow-level scheduling metrics (shortest flow first, etc) Considers flows in isolation Example: SFF schedules the shorter flows of different tasks first, leaving the longer flows of all the tasks to the end, thus delaying the completion of all the tasks. We need something better. Unfortunately, this problem is NP-hard :(. But we can use some heuristics!

Task serialization The set of policies where an entire task is scheduled before the next. This improves upon fair-sharing because it eliminates contention. One good task serialization algorithm is actually simple: first-in-first-out. Another example would be shortest-task-first (STF), which improves the average completion time, but leads to high tail latency or even starvation if short tasks keep coming in and preempting long tasks.

Task serialization FIFO is great for light-tailed distributions in fact it s provably optimal for minimizing the tail completion time. But it isn t that great for heavy-tailed distributions. Elephant flows which happen to arrive first end up blocking small flows, increasing latency.

FIFO-LM The paper proposes FIFO-LM: first-in-first-out with limited multiplexing Just like FIFO, but does a limited number of tasks the degree of multiplexing at once. Hybrid between FIFO (degree = 1) and fair-sharing (degree = ).

Baraat The authors distributed implementation of FIFO-LM No explicit coordination Based on globally unique task-ids. Lower ID means higher priority Flows inherit the ID of their tasks. Incrementing counter for every point where tasks arrive

Prioritization mechanism We have task priorities now. We still need an algorithm that uses them to efficiently schedule tasks. We can theoretically use one of the zillions of different existing flow-prioritization algorithms. But they don t have the properties we need. We need a new algorithm.

Smart Priority Class Similar to traditional priority queues High-priority flows preempt low-priority flows Flows with the same priority share bandwidth fairly Two differences: On-switch classifier: one-to-one mapping between tasks and priorities. Detects heavy tasks on-the-fly, and bump their priority down to that of the next-prioritized class. -LM part in FIFO-LM! Explicit rate control: switches tell senders how quickly to send. This moves more work to the end hosts and reduces the overhead of bookkeeping in switches.

Explicit rate protocol Every RTT, sender transmits a scheduling request message that demands a certain rate. Switch tells sender two numbers Actual rate (AR): how much should be sent in the next RTT Nominal rate (NR): maximum possible rate based on the priority

Explicit rate protocol We end up implementing FIFO-LM in a distributed way, with no global communication or central controller needed. But is it actually a lot better than existing schedulers? Experiments!

Evaluation The paper evaluates Baraat on three platforms Small scale testbed Huge datacenter simulation Micro-benchmarks All show significant improvements compared to other techniques

Small-scale tests Storage retrieval scenario: clients read data from storage servers in parallel One rack of four nodes running Memcached as the client, four more racks acting as the backend One switch connecting everything Very significant improvmeents in task completion time. (Nitty-gritty details of setup in paper)

Large datacenter simulation Three-level tree topology Racks of 40 machines with 1 Gbps links connected to top-of-rack switch and then to aggregator switch Three different workloads: Search engine (Bing) Data analytics (Facebook) Homogeneous application: uniformly distributed flow sizes from 2 KB to 50 KB

Bing-like workloads Policies comparable until the 70th percentile At that point, size-based policies begin starving heavy tasks. Baraat s limited multiplexing fixes this problem well.

Other two workloads Data-analytics workloads are heavy-tailed, and FIFO suffers from head-of-line blocking. Size-based policies reduce completion time relative to fair-sharing here. But still causes starvation issues at the very end of the tail Baraat still much faster 60% faster than fair-sharing 36% faster than size-based policies Uniform workloads show benefits too Baraat is 48% faster than fair-sharing Size-based policies have serious starvation issues, and ends up 50% slower than fair-sharing.

Very small tasks The ns-2 network simulator was used to microbenchmark small tasks with tiny flows. Still provides significant benefits over fair-sharing due to minimal setup overhead We can improve performance even more by breaking our one-task-per-priority-class invariant, and aggregating multiple tiny tasks into a single class. But only up to a point! Otherwise it degenerates into fair-sharing.

Discussion and further work Multi-pathing Data centers often have multi-root topologies for path diversity. Existing mechanisms for spreading traffic among paths maintain flow-to-path affinity. So Baraat can be used even in multi-root topologies. Senders can load-balance by sending SRQ packets among different paths Non-network resources Baraat doesn t try to schedule non-network resources like CPU This is generally not an issue: Baraat will either saturate the CPU or the network link depending on which is the bottleneck Future work: improve performance even more by coordinating multiple resources.