Towards Predictable Datacenter Networks

Similar documents
Predictable Data Centers

Towards Predictable Datacenter Networks

Chatty Tenants and the Cloud Network Sharing Problem

The Price Is Right: Towards Location-Independent Costs in Datacenters

Beyond the Stars: Revisiting Virtual Cluster Embeddings

Decentralized Task-Aware Scheduling for Data Center Networks

Chatty Tenants and the Cloud Network Sharing Problem

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks

Silo: Predictable Message Completion Time in the Cloud

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Towards an understanding of oversubscription in cloud

Cloud Computing Trends

STRATEGIC WHITE PAPER. The next step in server virtualization: How containers are changing the cloud and application landscape

CLOUD NETWORKING THE NEXT CHAPTER FLORIN BALUS

Towards Rack-scale Computing Challenges and Opportunities

Microsoft Private Cloud Fast Track

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Lecture 7: Data Center Networks"

Core and Pod Data Center Design

Falloc: Fair Network Bandwidth Allocation in IaaS Datacenters via a Bargaining Game Approach

Impact of Virtualization on Cloud Networking Arista Networks Whitepaper

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

MS 20246C Monitoring and Operating a Private Cloud

Managing Network Reservation for Tenants in Oversubscribed Clouds

Data Centers and Cloud Computing

Comparison of Windows IaaS Environments

Data Centers and Cloud Computing. Data Centers

On Tackling Virtual Data Center Embedding Problem

Paolo Costa

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

Data Centers and Cloud Computing. Data Centers

Optimizing Data Center Networks for Cloud Computing

Improving MapReduce Performance in Heterogeneous Environments

Scaling IP Mul-cast on Datacenter Topologies. Xiaozhou Li Mike Freedman

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

OpenFlow/SDN for IaaS Providers

Security and Billing for Azure Pack. Presented by 5nine Software and Cloud Cruiser

Emerging Technology for the Next Decade

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Zadara Storage Cloud A

Data Center Virtualization and Cloud QA Expertise

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

Panel : Future Data Center Networks

Cloud Optimize Your IT

Networking Issues For Big Data

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

IPOP-TinCan: User-defined IP-over-P2P Virtual Private Networks

20247D: Configuring and Deploying a Private Cloud

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

Flexible Building Blocks for Software Defined Network Function Virtualization (Tenant-Programmable Virtual Networks)

Configuring and Deploying a Private Cloud. Day(s): 5. Overview

STRATEGIC WHITE PAPER. Securing cloud environments with Nuage Networks VSP: Policy-based security automation and microsegmentation overview

Software-Defined Networks Powered by VellOS

Network Virtualization for the Enterprise Data Center. Guido Appenzeller Open Networking Summit October 2011

Bringing the Public Cloud to Your Data Center

MS 20247C Configuring and Deploying a Private Cloud

MS-20246: Monitoring and Operating a Private Cloud

Network Virtualization and Application Delivery Using Software Defined Networking

Building Scalable Multi-Tenant Cloud Networks with OpenFlow and OpenStack

Cloud computing. Examples

Webpage: Volume 3, Issue XI, Nov ISSN

Testing Software Defined Network (SDN) For Data Center and Cloud VERYX TECHNOLOGIES

Multi-Datacenter Replication

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Cost-effective Resource Provisioning for MapReduce in a Cloud

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Sistemi Operativi e Reti. Cloud Computing

What is SDN all about?

SDN: A NEW PARADIGM. Kireeti Kompella CTO, JDI

Installing Intercloud Fabric Firewall

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

Data Center Content Delivery Network

2013 ovh.com. All rights reserved

Virtualization of the MS Exchange Server Environment

CS 695 Topics in Virtualization and Cloud Computing. Introduction

Data Center Migration Lift and Shift Use Case Scenario

Virtual Data Center Allocation with Dynamic Clustering in Clouds

Network Virtualization

CS 695 Topics in Virtualization and Cloud Computing and Storage Systems. Introduction

Figure 1. The cloud scales: Amazon EC2 growth [2].

Cloud Computing Architecture: A Survey

Big Data Trends and HDFS Evolution

Cloud Computing using MapReduce, Hadoop, Spark

Transcription:

Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron Microsoft Research, Cambridge

This talk is about Guaranteeing network performance for tenants in multi-tenant datacenters Multi-tenant datacenters Datacenters with multiple (possibly competing) tenants Private datacenters Run by organizations like Facebook, Intel, etc. Tenants: Product groups and applications Cloud datacenters Amazon EC2, Microsoft Azure, Rackspace, etc. Tenants: Users renting virtual machines

Tenant Cloud datacenters 101 Simple interface: Tenants ask for a set of VMs Amazon EC2 Interface Request VMs Charging is per-vm, per-hour Amazon EC2 small instances: $0.085/hour No (intra-cloud) network cost Network performance is not guaranteed Bandwidth between a tenant s VMs depends on their placement, network load, protocols used, etc.

Performance variability in Up the to 5x wild variability Study Study Provider Duration A [Giurgui 10] Amazon EC2 n/a B [Schad 10] Amazon EC2 31 days C/D/E [Li 10] (Azure, EC2, Rackspace) 1 day F/G [Yu 10] Amazon EC2 1 day H [Mangot 09] Amazon EC2 1 day

Network performance can vary... so what? Tenant Tenant Map Reduce Job Data analytics on an isolated cluster Enterprise Results Data analytics in a multi-tenant datacenter Completion Time 4 hours Unpredictability of application performance and tenant costs is a key hindrance to cloud adoption Key Map Contributor: Reduce Network Results performance variation Job Datacenter Completion Time 10-16 hours Variable tenant costs Expected cost (based on 4 hour completion time) = $100 Actual cost = $250-400

Predictable datacenter networks Extend the tenant-provider interface to account for the network Tenant Request Request Virtual Network # of VMs and network demands # of VMs and network demands Contributions- VM 1 VM 2 VM N Virtual network abstractions To capture tenant network demands Oktopus: Proof of concept system Implements virtual networks in multi-tenant datacenters Can be incrementally deployed today! Key Idea: Tenants are offered a virtual network with bandwidth guarantees This decouples tenant performance from provider infrastructure

Introduction Talk Outline Virtual network abstractions Oktopus Allocating virtual networks Enforcing virtual networks Evaluation

Abstraction 1: Virtual Cluster (VC) Motivation: In enterprises, tenants run applications on dedicated Ethernet clusters Virtual Switch Total bandwidth = N * B VM 1 VM 2 VM N Request <N, B> N VMs. Each VM can send and receive at B Mbps Tenants get a network with no oversubscription Suitable for data-intensive apps. (MapReduce, BLAST) Moderate provider flexibility

Abstraction 2: Virtual Oversubscribed Cluster (VOC) VMs can send traffic to group members at B Mbps Group Virtual Switch Root Virtual Switch Total bandwidth at root = N * B / O Total bandwidth at VMs = N * B VM 1 Group 1 VM N S VM 1 Group 2 VM S. VM 1 VM S Group N/S Motivation: Many Request applications <N, B, moving S, O> to the cloud have N VOC VMs capitalizes in groups of size S. Oversubscription factor O. localized on communication tenant communication patterns patterns Applications No oversubscription Suitable are composed for typical for of intra-group applications groups with (though communication more traffic not all) within (captures Intra-group the Improved sparseness groups communication than of provider across inter-group is the groups flexibility common communication) case! Oversubscription factor O for inter-group communication

Introduction Talk Outline Virtual network abstractions Oktopus Allocating virtual networks Enforcing virtual networks Evaluation

Oktopus Offers virtual networks to tenants in datacenters Two main components Management plane: Allocation of tenant requests Allocates tenant requests to physical infrastructure Accounts for tenant network bandwidth requirements Data plane: Enforcement of virtual networks Enforces tenant bandwidth requirements Achieved through rate limiting at end hosts

Allocating Virtual Clusters Request : <3 VMs, 100 Mbps> Max Sending Rate = 2*100 = 200 VM What for an bandwidth existing Max Receive tenant Rate = needs to be 1*100 = 100 reserved for the B/W tenant needed on this on link? = Min (200, 100) = 100Mbps For a virtual cluster <N,B>, bandwidth needed on a link that How Datacenter to Tenant find Physical a valid Request allocation? Topology connects m VMs to the remaining (N-m) VMs is = Min (m, N-m) * B An allocation Link divides of tenant virtual VMs tree to into physical two parts machines Allocate 4 physical a tenant machines, asking for 2 3 VM VMs slots arranged per machine a virtual cluster Tenant Consider with traffic all traffic 100 Mbps traverses from each, the the i.e. highlighted left to right <3 VMs, 100Mbps> links part Bandwidth needed <= Link s Residual Bandwidth For a valid allocation:

Allocation Algorithm 1000 1000 Request : <3 VMs, 100 Mbps> 1000 1000 200 200 How Solution many VMs can At most be allocated 1 VM for to this tenant machine? can be 2 VMs allocated here 2 VMs 2 VMs 1 VM 1 VM Constraints for Allocation # of VMs (m) is that fast can and be allocated efficient to the machine- 3 VMs Greedy Key allocation intuition Packing algorithm Traverse Validity up the conditions VMs together hierarchy can motivated and be determine used to by determine the fact that the lowest the level at number datacenter of VMs networks which that all can are 3 VMs be can allocated typically oversubscribed be allocated to any level of the Allocation datacenter; can be extended machines, for goals racks like failure and so resiliency, on etc. 1. VMs can only be allocated to empty slots m <= 1 2. 3 VMs are requested m <= 3 3. Enough b/w on outbound link min (m, 3-m)*100 <= 200

Introduction Talk Outline Virtual network abstractions Oktopus Allocating virtual networks Enforcing virtual networks Evaluation

Enforcement in Oktopus: Key highlights Oktopus enforces virtual networks at end hosts Use egress rate limiters at end hosts Implement on hypervisor/vmm Oktopus can be deployed today No changes to tenant applications No network support Tenants without virtual networks can be supported Good for incremental roll out

Introduction Talk Outline Virtual network abstractions Oktopus Allocating virtual networks Enforcing virtual networks Evaluation

Datacenter Simulator Flow-based simulator 16,000 servers and 4 VMs/server 64,000 VMs Three-tier network topology (10:1 oversubscription) Tenants submit requests for VMs and execute jobs Job: VMs process and shuffle data between each other Baseline: representative of today s setup Tenants simply ask for VMs VMs are allocated in a locality-aware fashion Virtual network request Tenants ask for Virtual Cluster (VC) or Virtual Oversubscribed Cluster (VOC)

Worse Private datacenters VC is Virtual Cluster VOC-10 is Virtual Oversubscribed Cluster with oversubscription=10 Execute a batch of 10,000 tenant jobs Jobs vary in network intensiveness (bandwidth at which a job can generate data) Better Virtual networks improve completion time VC: 50% of Baseline VOC-10: 31% of Baseline Jobs become more network intensive

Cloud Datacenters Amazon EC2 s reported target utilization Worse Tenant job requests arrive over time Jobs are rejected if they cannot be accommodated on arrival (representative of cloud datacenters) Better Job requests arrive faster Rejected Requests Baseline: 31% VC: 15% VOC-10: 5%

Tenant Costs What should tenants pay to ensure provider revenue neutrality, i.e. provider revenue remains the same with all approaches Based on today s EC2 prices, i.e. $0.085/hour for each VM Provider revenue increases while tenants pay less At 70% target utilization, provider revenue increases by 20% and median tenant cost reduces by 42%

Oktopus Deployment Implementation scales well and imposes low overhead Allocation of virtual networks is fast In a datacenter with 10 5 machines, median allocation time is 0.35ms Enforcement of virtual networks is cheap Use Traffic Control API to enforce rate limits at end hosts Deployment on testbed with 25 end hosts End hosts arranged in five racks

Oktopus Deployment Cross-validation of simulation results Completion time for jobs in the simulator matches that on the testbed

Summary Proposal: Offer virtual networks to tenants Virtual network abstractions Resemble physical networks in enterprises Make transition easier for tenants Proof of concept: Oktopus Tenants get guaranteed network performance Sufficient multiplexing for providers Win-win: tenants pay less, providers earn more! How to determine tenant network demands? Ongoing work: Map high-level goals (like desired completion time) to Oktopus abstractions

Thank you