Friends, not Foes Synthesizing Existing Transport Strategies for Data Center Networks



Similar documents
Decentralized Task-Aware Scheduling for Data Center Networks

Load Balancing in Data Center Networks

MMPTCP: A Novel Transport Protocol for Data Centre Networks

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

Decentralized Task-Aware Scheduling for Data Center Networks

Decentralized Task-Aware Scheduling for Data Center Networks

phost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric

Network traffic: Scaling

Improving Flow Completion Time and Throughput in Data Center Networks

PACE Your Network: Fair and Controllable Multi- Tenant Data Center Networks

SIGCOMM Preview Session: Data Center Networking (DCN)

Predictable Data Centers

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

On real-time delay monitoring in software-defined networks

Improving Flow Completion Time for Short Flows in Datacenter Networks

Advanced Computer Networks. Datacenter Network Fabric

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks

Multipath TCP in Data Centres (work in progress)

Advanced Computer Networks. Scheduling

Dynamic Network Resources Allocation in Grids through a Grid Network Resource Broker

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Better Never than Late: Meeting Deadlines in Datacenter Networks

Extensions to FreeBSD Datacenter TCP for Incremental Deployment Support

Longer is Better? Exploiting Path Diversity in Data Centre Networks

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

ICTCP: Incast Congestion Control for TCP in Data Center Networks

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

Data Networks Summer 2007 Homework #3

Deconstructing Datacenter Packet Transport

Better Never than Late: Meeting Deadlines in Datacenter Networks

Resource Utilization of Middleware Components in Embedded Systems

Technological Trend. A Framework for Highly-Available Cascaded Real-Time Internet Services. Service Composition. Service Composition

Multipath TCP design, and application to data centers. Damon Wischik, Mark Handley, Costin Raiciu, Christopher Pluntke

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Data Center TCP (DCTCP)

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Load Balancing Mechanisms in Data Center Networks

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points

CSE 123: Computer Networks

2 TCP-like Design. Answer

TinyFlow: Breaking Elephants Down Into Mice in Data Center Networks

An Improvement Mechanism for Low Priority Traffic TCP Performance in Strict Priority Queueing

Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat

Low-rate TCP-targeted Denial of Service Attack Defense

Improving our Evaluation of Transport Protocols. Sally Floyd Hamilton Institute July 29, 2005

Empowering Software Defined Network Controller with Packet-Level Information

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments

TCP Pacing in Data Center Networks

A Congestion Control Algorithm for Data Center Area Communications

MPLS-TP. Future Ready. Today. Introduction. Connection Oriented Transport

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Stability of QOS. Avinash Varadarajan, Subhransu Maji

Enabling Conferencing Applications on the Internet using an Overlay Multicast Architecture

Clearing the Way for VoIP

Duet: Cloud Scale Load Balancing with Hardware and Software

Silo: Predictable Message Completion Time in the Cloud

Fault-Tolerant Framework for Load Balancing System

A Passive Method for Estimating End-to-End TCP Packet Loss

Comparative Analysis of Congestion Control Algorithms Using ns-2

A SENSIBLE GUIDE TO LATENCY MANAGEMENT

# % # % & () # () + (, ( ( (. /( ( (. /(. 0!12!3 &! 1. 4 ( /+ ) 0

Architecting Low Latency Cloud Networks

PRIORITY-BASED NETWORK QUALITY OF SERVICE

The Impact of Background Network Traffic on Foreground Network Traffic

Encapsulating Voice in IP Packets

Towards Predictable Datacenter Networks

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Quality of Service (QoS) for Enterprise Networks. Learn How to Configure QoS on Cisco Routers. Share:

Voice over Internet Protocol (VoIP) systems can be built up in numerous forms and these systems include mobile units, conferencing units and

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

Outline. TCP connection setup/data transfer Computer Networking. TCP Reliability. Congestion sources and collapse. Congestion control basics

Load Balancing in Data Center Networks

Networking Topology For Your System

Router-assisted congestion control. Lecture 8 CS 653, Fall 2010

Cisco Bandwidth Quality Manager 3.1

Hedera: Dynamic Flow Scheduling for Data Center Networks

Using Linux Traffic Control on Virtual Circuits J. Zurawski Internet2 February 25 nd 2013

Application Level Congestion Control Enhancements in High BDP Networks. Anupama Sundaresan

Implementing Replication for Predictability within Apache Thrift

Quality of Service Analysis of site to site for IPSec VPNs for realtime multimedia traffic.

Internet Flow Control - Improving on TCP

CS6204 Advanced Topics in Networking

Per-Flow Queuing Allot's Approach to Bandwidth Management

Delay-Based Early Congestion Detection and Adaptation in TCP: Impact on web performance

TCP in Wireless Mobile Networks

TCP for Wireless Networks

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

All Rights Reserved - Library of University of Jordan - Center of Thesis Deposit

SJBIT, Bangalore, KARNATAKA

The ISP Column A monthly column on all things Internet

B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13

1. The subnet must prevent additional packets from entering the congested region until those already present can be processed.

pfabric: Minimal Near-Optimal Datacenter Transport

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Denial-of-Service Shrew Attacks

Data Center Load Balancing Kristian Hartikainen

International Journal of Advanced Research in Computer Science and Software Engineering

USER EXPERIENCE: 5 ESSENTIAL MONITORING METRICS

First Midterm for ECE374 02/25/15 Solution!!

Transcription:

Friends, not Foes Synthesizing Existing Transport Strategies for Data Center Networks Ali Munir Michigan State University Ghufran Baig, Syed M. Irteza, Ihsan A. Qazi, Alex X. Liu, Fahad R. Dogar

Data Center (DC) Applications Distributed applications Components interact via the network e.g., a bing search query touches > 100 machines Search HPC Mail Map- Reduce Monitoring Network impacts performance 10% of search responses observe 1 to 14 ms of network queuing delay [ DCTCP, SIGCOMM 10] Image source: http://cdn.slashgear.com/wp-content/uploads/2012/10/google-datacenter-tech-13.jpg

DC Network Resource Allocation Fair Sharing Equal bandwidth sharing among jobs [TCP, DCTCP] Increases completion time for everyone Traditional fairness metrics less relevant QoS Aware Prioritize some jobs over other jobs (Priority Scheduling) Minimize flow completion times [pfabric, L 2 DCT] Meet flow deadlines [D 3, D 2 TCP]

DC Transports DCTCP SIGCOMM 10 D 2 TCP SIGCOMM 12 L 2 DCT INFOCOM 13 D 3 SIGCOMM 11 pfabric SIGCOMM 13 PDQ SIGCOMM 12

DC Transports DCTCP SIGCOMM 10 D 2 TCP SIGCOMM 12 L 2 DCT INFOCOM 13 Near D 3 Optimal but not SIGCOMM 11 pfabric SIGCOMM 13 Deployment Friendly PDQ SIGCOMM 12 (Changes in data plane)

DC Transports DCTCP SIGCOMM 10 Deployment Friendly L 2 DCTbut Suboptimal INFOCOM 13 D 2 TCP SIGCOMM 12 Near D 3 Optimal but not SIGCOMM 11 pfabric SIGCOMM 13 Deployment Friendly PDQ SIGCOMM 12 (Changes in data plane)

DC Transports DCTCP SIGCOMM 10 Deployment Friendly L 2 DCTbut Suboptimal INFOCOM 13 D 2 TCP SIGCOMM 12 Near D 3 Optimal but not SIGCOMM 11 pfabric SIGCOMM 13 Deployment Friendly PDQ SIGCOMM 12 (Changes in data plane) Step back and ask How can we design a deployment friendly and near optimal data center transport while leveraging the insights offered by existing proposals?

DC Transports DCTCP SIGCOMM 10 Deployment Friendly L 2 DCTbut Suboptimal INFOCOM 13 D 2 TCP SIGCOMM 12 Near D 3 Optimal but not SIGCOMM 11 pfabric SIGCOMM 13 Deployment Friendly PDQ SIGCOMM 12 (Changes in data plane) Step back and ask How can we design a deployment friendly and near optimal data center PASE transport while leveraging the insights offered by existing proposals?

Rest of the Talk DC Transport Strategies PASE Design Evaluation

Rest of the Talk DC Transport Strategies PASE Design Evaluation

DC Transport Strategies Self-adjusting endpoints senders make independent decisions and adjust rate by themselves Arbitration e.g., TCP, DCTCP, L 2 DCT e.g., D 3, PDQ a common network entity (e.g., a switch) allocates rates to each flow In-network prioritization e.g., pfabric switches schedule and drop packets based on the packet priority

DC Transport Strategies Self-adjusting endpoints e.g., TCP, DCTCP, L 2 DCT senders make independent decisions and adjust rate by themselves Arbitration e.g., D 3, PDQ Existing DC transport proposals use only one of these strategies a common network entity (e.g., a switch) allocates rates to each flow In-network prioritization e.g., pfabric switches schedule and drop packets based on the packet priority

Transport Strategies in Isolation Transport Strategy Self- Adjusting Endpoints Example Pros Cons DCTCP, D 2 TCP, L 2 DCT Arbitration PDQ, D 3 In-network Prioritization pfabric

Transport Strategies in Isolation Transport Strategy Self- Adjusting Endpoints Example Pros Cons DCTCP, D 2 TCP, L 2 DCT Ease of deployment No strict priority scheduling Arbitration PDQ, D 3 In-network Prioritization pfabric

Transport Strategies in Isolation Transport Strategy Self- Adjusting Endpoints Example Pros Cons DCTCP, D 2 TCP, L 2 DCT Ease of deployment Arbitration PDQ, D 3 Strict priority scheduling No strict priority scheduling o High flow switching overhead o Hard to compute precise rates In-network Prioritization pfabric

Transport Strategies in Isolation Transport Example Pros Cons Strategy Self- DCTCP, Adjusting D 2 No strict priority TCP, Ease of deployment Endpoints L 2 scheduling DCT o High flow switching Arbitration PDQ, D 3 Strict priority overhead scheduling o Hard to compute precise rates In-network Prioritization pfabric Low flow switching overhead o Switch-local decisions o Limited # of priority queues

Transport Strategies in Unison Transport Example Pros Cons Strategy Self- DCTCP, Adjusting D 2 No strict priority TCP, Ease of deployment Endpoints L 2 scheduling DCT o High flow switching Arbitration PDQ, D 3 Strict priority overhead scheduling o Hard to compute precise rates In-network Prioritization pfabric Low flow switching overhead o Switch-local decisions o Limited # of priority queues

Transport Strategies in Unison In-network Prioritization Alone Limited # of queues More # of flows (priorities) Flows 1 2 3 4 High Priority Low Priority

Transport Strategies in Unison In-network Prioritization Alone Limited # of queues More # of flows (priorities) Flow Multiplexing Limited performance gains! Flows 1 2 3 4 High Priority Low Priority Any static mapping mechanism degrades performance!

Transport Strategies in Unison In-network Prioritization + Arbitration Arbitrator Dynamic mapping of flows to queues Idea As a flow s turn comes, map it to the highest priority queue!

Transport Strategies in Unison In-network Prioritization + Arbitration Arbitrator Dynamic mapping of flows to queues Idea As a flow s turn comes, map it to the highest priority queue! Flows 1 High Priority 2 3 4 Arbitrator Low Priority Time t 1

Transport Strategies in Unison In-network Prioritization + Arbitration Arbitrator Dynamic mapping of flows to queues Idea As a flow s turn comes, map it to the highest priority queue! Flows 1 High Priority Flows 2 High Priority 2 3 4 Arbitrator 3 4 Arbitrator Low Priority Low Priority Time t 1 Time t 2

Transport Strategies in Unison In-network Prioritization + Arbitration Arbitrator Dynamic mapping of flows to queues Flows 1 2 3 4 Similarly, Idea As a flow s turn comes, map it to the highest priority queue! High Arbitration Priority + Self-Adjusting Endpoints High Priority Arbitration + In-network Flows Prioritization 2 PASE leverages these insights in its design! Arbitrator 3 4 Arbitrator Low Priority Low Priority Time t 1 Time t 2

Rest of the Talk DC Transport Strategies PASE Design Evaluation

PASE Design Principle Each transport strategy should focus on what it is best at doing! Arbitrators Do inter-flow prioritization at coarse time-scales Endpoints Probe for any spare link capacity In-network prioritization Do per-packet prioritization at sub-rtt timescales

PASE Overview Sender Receiver Arbitrator

PASE Overview Sender Receiver Arbitrator Arbitration: Control plane Calculate reference rate and priority queue

PASE Overview Sender Feedback Receiver Arbitrator Arbitration: Control plane Calculate reference rate and priority queue Self-Adjusting Endpoints: Guided rate control Use arbitrator feedback as a pivot

PASE Overview Sender Feedback Receiver Arbitrator Arbitration: Control plane Calculate reference rate and priority queue Self-Adjusting Endpoints: Guided rate control Use arbitrator feedback as a pivot In-network Prioritization: Existing priority queues

PASE Overview Sender Feedback Receiver Arbitrator Arbitration: Control plane Calculate reference rate and priority queue Self-Adjusting Endpoints: Guided rate control Use arbitrator feedback as a pivot In-network Prioritization: Existing priority queues Key Components

PASE Arbitration Sender Receiver Arbitrator

PASE Arbitration Sender Receiver Arbitrator Arbitrator Arbitrator Distributed Arbitration per link arbitration done in control plane existing protocols implement in data plane

PASE Arbitration Sender Receiver Arbitrator Arbitrator Arbitrator Distributed Arbitration per link arbitration done in control plane existing protocols implement in data plane Arbitrator Location at the end hosts (e.g., for their own links to the switch) OR on dedicated hosts inside the DC

PASE Arbitration Sender Feedback Feedback Feedback Receiver Arbitrator Arbitrator Arbitrator Distributed Arbitration per link arbitration done in control plane existing protocols implement in data plane Arbitrator Location at the end hosts (e.g., for their own links to the switch) OR on dedicated hosts inside the DC

PASE Arbitration Sender Sends data Feedback Feedback Feedback with min priority Arbitrator Arbitrator Arbitrator Receiver Distributed Arbitration per link arbitration done in control plane existing protocols implement in data plane Arbitrator Location at the end hosts (e.g., for their own links to the switch) OR on dedicated hosts inside the DC

PASE Arbitration Challenges Challenges Arbitration latency Processing overhead Network overhead

PASE Arbitration Challenges Challenges Arbitration latency Processing overhead Network overhead Solution: Leverage the tree-like structure of typical DC topologies

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR Sender Inter-Rack Receiver

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR Sender Inter-Rack Arbitration Message Receiver

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR Sender Inter-Rack Arbitration Message Receiver

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR Sender Inter-Rack Arbitration Message Receiver Response Receiver

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR ToR Sender Intra-Rack Receiver No external arbitrators required! Sender Inter-Rack Arbitration Message Receiver Response Receiver

Bottom Up Arbitration Leverage Tree Structure from leaves up to the root Aggregation Core Aggregation ToR ToR ToR Sender Intra-Rack Receiver No external arbitrators required! Sender Inter-Rack Arbitration Message Receiver Response Receiver Facilitates inter-rack optimizations (early pruning & delegation) to reduce arbitration overhead.

Early Pruning TOR Core Agg k k k k Arbitration involves sorting flows and picking top k for immediate scheduling Flows that won t make it to top k queues should be pruned at lower levels

Early Pruning TOR Arbitration involves sorting flows and picking top k for immediate scheduling Reduces k k k Network and Processing overhead Fewer flows contact the higher level Agg k Core arbitrators! Flows that won t make it to top k queues should be pruned at lower levels

Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Core Aggregation ToRs

Aggregation Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Link Capacity C Core Algorithm Link capacity C is split in N virtual links ToRs

Link Capacity Aggregation Delegated Capacities ToRs C Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Core a 1 a 2 a N Algorithm Link capacity C is split in N virtual links Parent arbitrator delegates virtual link to child arbitrator

Link Capacity Aggregation Delegated Capacities ToRs C Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Core a 1 a 2 a N Algorithm Link capacity C is split in N virtual links Parent arbitrator delegates virtual link to child arbitrator Child arbitrator does arbitration for virtual link

Link Capacity Aggregation Delegated Capacities ToRs C Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Core a 1 a 2 a N Algorithm Link capacity C is split in N virtual links Parent arbitrator delegates virtual link to child arbitrator Child arbitrator does arbitration for virtual link Virtual link capacity is periodically updated based on the top k flows of all child arbitrators

Link Capacity Aggregation Delegated Capacities ToRs Delegation Key Idea: Divide a link into virtual links and delegate responsibility to child arbitrators Core Algorithm C Link capacity C is split in N virtual links Reduces Arbitration Latency Parent arbitrator delegates Make arbitration decision close to the virtual link to child arbitrator sources a 1 a 2 a N Child arbitrator does arbitration for virtual link Virtual link capacity is periodically updated based on the top k flows of all child arbitrators

PASE Overview Sender Feedback Receiver Arbitrator Arbitration: Control plane Calculate reference rate and priority queue Self-Adjusting Endpoints: Guided rate control Use arbitrator feedback as a pivot In-network Prioritization: Existing priority queues

PASE Endhost Transport Rate Control Loss Recovery Mechanism

PASE Endhost Transport Rate Control Use reference rate and priority feedback from arbitrators Use reference-rate as pivot, and Follow DCTCP control laws Loss Recovery Mechanism

PASE Endhost Transport Rate Control Use reference rate and priority feedback from arbitrators Use reference-rate as pivot, and Follow DCTCP control laws Loss Recovery Mechanism Packets in lower priority queues can be delayed for several RTTs large RTO OR small probe to avoid spurious retransmissions

PASE -- Putting it Together Sender Receiver Feedback Feedback Feedback Arbitrator Arbitrator Arbitrator Efficient arbitration control plane Simple TCP-like transport Existing priority queues inside switches

Rest of the Talk DC Transport Strategies PASE Design Evaluation

Evaluation Platforms Small scale testbed NS2 Workloads Web search (DCTCP), Data mining (VL2) Comparison with deployment friendly DCTCP, D 2 TCP, L 2 DCT Comparison with state of the art pfabric

Simulation Setup Queue Size RTT RTO L 40 250KB (per queue) 300usec 1 msec

Comparison with Deployment Friendly Settings similar to D 2 TCP Flow Sizes: 100-500KB Deadlines: 5-25msec

Comparison with Deployment Friendly Settings similar to D 2 TCP Flow Sizes: 100-500KB Deadlines: 5-25msec PASE is deployment friendly yet performs BETTER than existing protocols!

Comparison with State of the Art percentile Settings Flow Sizes: 2-98KB Left-to-right traffic 99 th

Comparison with State of the Art percentile Settings Flow Sizes: 2-98KB Left-to-right traffic 99 th PASE performs comparable and does not require changes to data plane

Summary Key Strategies for Existing DC Transport Arbitration, in-network Prioritization, Self-Adjusting Endpoints Complimentary rather than substitutes PASE Combines the three strategies Efficient arbitration control plane; simple TCP-like transport; leverages existing priority queues inside switches Performance Comparable to or better than earlier proposals that even require changes to the network fabric

Thank you!