Predictable Data Centers

Similar documents
Chatty Tenants and the Cloud Network Sharing Problem

Towards Predictable Datacenter Networks

Decentralized Task-Aware Scheduling for Data Center Networks

Chatty Tenants and the Cloud Network Sharing Problem

Friends, not Foes Synthesizing Existing Transport Strategies for Data Center Networks

Beyond the Stars: Revisiting Virtual Cluster Embeddings

The Price Is Right: Towards Location-Independent Costs in Datacenters

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

CLOUD NETWORKING THE NEXT CHAPTER FLORIN BALUS

Cloud Computing Is In Your Future

Cloud Computing Trends

PACE Your Network: Fair and Controllable Multi- Tenant Data Center Networks

Cloud SLAs: Present and Future. Salman Baset

Towards Predictable Datacenter Networks

Paolo Costa

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

Security & Trust in the Cloud

Architectural Implications of Cloud Computing

QoS & Traffic Management

Accelerate the Performance of Virtualized Databases Using PernixData FVP Software

Software-Defined Networks Powered by VellOS

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

MANAGED SERVICE PROVIDERS SOLUTION BRIEF

Azure Media Service Cloud Video Delivery KILROY HUGHES MICROSOFT AZURE MEDIA

Towards an understanding of oversubscription in cloud

MS 20246C Monitoring and Operating a Private Cloud

Cloud Computing with Microsoft Azure

Scheduling and Monitoring of Internally Structured Services in Cloud Federations

OpenStack & AWS: The Fastest Path to Hybrid IT

Mobile and Cloud computing and SE

OpenFlow/SDN for IaaS Providers

Data Centers and Cloud Computing. Data Centers

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

CLOUD COMPUTING. When It's smarter to rent than to buy

Network Infrastructure Services CS848 Project

One click Hadoop clusters - anywhere

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Cloud Federations in Contrail

CS 695 Topics in Virtualization and Cloud Computing and Storage Systems. Introduction

Today: Data Centers & Cloud Computing" Data Centers"

Cloud Scale Resource Management: Challenges and Techniques

Service Performance Aspects for Cloud Service Level Agreements

Monitoring Cloud Applications. Amit Pathak

Mobile Cloud Networking FP7 European Project: Radio Access Network as a Service

Integrating Network. Management for Cloud

On Tackling Virtual Data Center Embedding Problem

Cloud Computing. Lecture 24 Cloud Platform Comparison

CSE543 Computer and Network Security Module: Cloud Computing

A Cooperative Game Based Allocation for Sharing Data Center Networks

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Data Centers and Cloud Computing

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Better Never than Late: Meeting Deadlines in Datacenter Networks

Decentralized Task-Aware Scheduling for Data Center Networks

A Gentle Introduction to Cloud Computing

Paying to Save: Reducing Cost of Colocation Data Center via Rewards

Performance Management for Cloud-based Applications STC 2012

Dimension Data Enabling the Journey to the Cloud

Security and Billing for Azure Pack. Presented by 5nine Software and Cloud Cruiser

Optimizing Data Center Networks for Cloud Computing

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

SLA-based Admission Control for a Software-as-a-Service Provider in Cloud Computing Environments

Cloud Optimize Your IT

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Cost-effective Resource Provisioning for MapReduce in a Cloud

Designing a Cloud Storage System

Infrastructure as a Service (IaaS)

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009


Cloud SLAs: Present and Future

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Towards Rack-scale Computing Challenges and Opportunities

Cloud Deployment Models

The Cisco Powered Network Cloud: An Exciting Managed Services Opportunity

CS 695 Topics in Virtualization and Cloud Computing. Introduction

NCTA Cloud Architecture

Resource Efficient Computing for Warehouse-scale Datacenters

Falloc: Fair Network Bandwidth Allocation in IaaS Datacenters via a Bargaining Game Approach

Transcription:

Predictable Data Centers Thomas Karagiannis Hitesh Ballani, Paolo Costa, Fahad Dogar, Keon Jang, Greg O Shea, Eno Thereska, and Ant Rowstron Systems & Networking Microsoft Research, Cambridge http://research.microsoft.com/datacenters/

Predictable Data Centers Goal: Enable predictable application performance in multi-tenant data centers Multi-tenant data center is a data center with multiple (possibly competing) tenants Private data centers Run by organizations like Facebook, Microsoft, Google, etc Tenants: Product groups and applications Cloud data centers Unpredictability: a key hindrance to cloud adoption Amazon EC2, Microsoft Azure, Rackspace, etc. Tenants: Users renting virtual machines

Dimensions of unpredictability Performance No throughput or latency guarantees Private data centers: SLA violations, starvation Public data centers: Impossible to provide SLAs Costs Absence of performance guarantees implies unpredictable costs Location-dependent costs Fairness Payment not always translates to performance differentiation

Dimensions of unpredictability Performance No throughput or latency guarantees Private data centers: SLA violations, starvation Public data centers: Impossible to provide SLAs Costs Root-cause: Shared resources Absence of performance guarantees implies unpredictable costs (network & storage) Location-dependent costs Fairness Payment not always translates to performance differentiation

Performance Unpredictability - Public cloud Ken Ken Map Reduce Data analytics on an isolated cluster Job Enterprise Results Data analytics in the public cloud Map Reduce Results Job Azure data center Completion Time 4 hours Completion Time 4-8 hours

Performance Unpredictability - Public cloud Ken Ken Map Reduce Data analytics on an isolated cluster Job Enterprise Results Data analytics in the public cloud Map Reduce Results Job Azure data center Completion Time 4 hours Completion Time 4-8 hours Variable tenant costs Expected cost (based on 4 hour completion time) = $100 Actual cost = $100-200

VM cost / unit time Cost unpredictability: Public cloud Today s Price CPU-bound jobs Job Cost = $ k N T (e.g.., k= $0.085 /hour) k Bandwidth (B) Network-bound jobs Job Cost = $ k N T but T = L, hence.. B Job Cost = $ k N L B Simple and intuitive Cost depends on B (high variability) No incentive for the provider

Towards a predictable cloud Performance Guarantee network throughput Virtual Network Abstractions [Oktopus, SIGCOMM 11] Request <N, B> N VMs. Each VM can send and receive at B Mbps Pricing Dominant resource pricing [HotNets 11] Change the cloud model! Job-based pricing [Bazaar, SOCC 12]

Bazaar Enables predictable performance and cost Tenant Job Request Perf/Cost constraints Bazaar Resources Required VMs and network Provider Job Cost Resource Utilization Today s pricing: Resource-based Bazaar enables job-based pricing!

Dimensions of unpredictability Performance No throughput or latency guarantees Private data centers: SLA violations, starvation Public data centers: Impossible to provide SLAs Costs Absence of performance guarantees implies unpredictable costs Location-dependent costs Fairness Payment not always translates to performance differentiation

SLA violations: User-facing online services Request Response Application SLAs 200 ms 100 ms 45 ms Aggregator 5ms 5ms 5ms 25ms 35ms 25ms Agg. Agg. Agg. 4ms 4ms 30ms 4ms 4ms 22ms 15ms 25ms Worker Worker Worker Worker Cascading SLAs SLAs for components at each level of the hierarchy Network SLAs Deadlines on communications between components Flow Deadlines A flow is useful if and only if it satisfies its deadline Today s transport protocols: Deadline agnostic and strive for fairness

Flows Flows Case for unfair sharing: Flow f1, 20ms f1 Limitations of Fair Sharing Status Quo Deadline aware X f1 Flow f2, 40ms f2 20 40 Time Flows Flows get f1 bandwidth f2 get in a accordance fair share of to bandwidth their deadlines Flow Deadline f1 misses awareness its deadline ensures (incomplete both flows response satisfy deadlines to user) f2 20 40 Time

Flows Flows Limitations of Fair Sharing Case for flow quenching: 6 flows with 30ms deadline 30 Time X X X X X X 30 Time With Insufficient deadline awareness, bandwidth one to satisfy flow can all deadlines be quenched With All fair other share, flows all make flows their miss the deadline deadline (partial (empty response)

Predictability in private data centers Deadline-driven flow scheduling D 3 [SIGCOMM 11] Prioritize flows based on deadlines Expose flow deadlines to the network Explicit rate control Task aware Data Centers From flow level metrics to application level benefits

How to allocate network resources? Fair Sharing Assign equal amount of bandwidth to all flows QoS aware allocation Two Broad Approaches Meet flow deadlines [D3, Sigcomm 11] Minimize flow completion times [PDQ, Sigcomm 12] Common Theme: Focus on flow level metrics Do they translate into app level benefits?

Task Oriented Applications Typical DC apps perform tasks Unit of work that can be linked to a waiting user Examples Answering a user s search query Generating a user s wall From the network s perspective, tasks generate rich workflows

Flow vs. Task aware optimizations Goal: Minimize Task Completion Times Task 1 Task 2 Shortest Flow First (SFF) Link A 3 6 Task Aware Link A 3 6 Link B 3 6 Link B 6 3 time time

Dimensions of unpredictability Performance No throughput or latency guarantees Private data centers: SLA violations, poor performance Public data centers: Impossible to provide SLAs Costs Absence of performance guarantees implies unpredictable costs Location-dependent costs Fairness Payment not always translates to performance differentiation

Two types of cloud traffic Intra-tenant Among a tenant s VMs Inter-tenant Tenant-tenant traffic Tenant provided cloud services storage, DB, identity management, etc Significant fraction: 10-35% of DC traffic Q: How to share DC network?

State of the art P and Q are paying same amount p 1000Mbps r1 r2 Allocation P (Mbps) Q (Mbps) Per flow 250 750 Seawall 250 750 FairCloud 333 666 q r3 r4 Q gets larger network share despite same payment! Q: Whose payment should dictate the flow bandwidth?

Hadrian: Network sharing in the cloud [NSDI 13] What semantics should we provide to tenants? Virtual network abstraction How to allocate bandwidth? Ensure upper-bound proportionality: Network share proportional to payment Based on minimum of payment of communicating tenants How to place virtual machines? Greedy heuristic that guarantees min bandwidth

Summary Unpredictability Key hindrance to cloud adoption Root cause: Shared resources Several challenges: performance, cost, fairness http://research.microsoft.com/datacenters/