Resource Efficient Computing for Warehouse-scale Datacenters

Similar documents
Performance Management for Cloudbased STC 2012

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

ICRI-CI Retreat Architecture track

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Building a More Efficient Data Center from Servers to Software. Aman Kansal

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

QPS,KW- hr,mtbf, T,PUE,IOPS,DB/RH: A Day in a Life of a Datacenter Architect

Datacenter Operating Systems

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Data Center and Cloud Computing Market Landscape and Challenges

ioscale: The Holy Grail for Hyperscale

Energy Efficient MapReduce

Next Generation Operating Systems

Reduce Latency and Increase Application Performance Up to 44x with Adaptec maxcache 3.0 SSD Read and Write Caching Solutions

Above the clouds: A Berkeley View of Cloud Computing

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Getting the Most Out of Flash Storage

Next-generation data center infrastructure

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Virtualization of the MS Exchange Server Environment

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Performance Management for Cloud-based Applications STC 2012

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Microsoft Private Cloud

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Concepts Introduced in Chapter 6. Warehouse-Scale Computers. Important Design Factors for WSCs. Programming Models for WSCs

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

View Point. Performance Monitoring in Cloud. Abstract. - Vineetha V

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Private cloud computing advances

Cloud Servers in the Datacenter: The Evolution of Density-Optimized

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

MySQL performance in a cloud. Mark Callaghan

Delivering Quality in Software Performance and Scalability Testing

I D C T E C H N O L O G Y S P O T L I G H T. I m p r o ve I T E f ficiency, S t o p S e r ve r S p r aw l

Beyond Voice. Data Only Cloud & Data Centers Mostly

Scaling from Datacenter to Client

SCALABILITY IN THE CLOUD

Lecture 24: WSC, Datacenters. Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

Informix on IBM PowerLinux Comparing Power7 and Power8

Flash Research Assignment: Virtualization and Cloud Computing

Switching Architectures for Cloud Network Designs

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

MS Exchange Server Acceleration

FUSION iocontrol HYBRID STORAGE ARCHITECTURE 1

Energy Constrained Resource Scheduling for Cloud Environment

Improve Power saving and efficiency in virtualized environment of datacenter by right choice of memory. Whitepaper

Virtual SAN Design and Deployment Guide

The best platform for building cloud infrastructures. Ralf von Gunten Sr. Systems Engineer VMware

Software-defined Storage Architecture for Analytics Computing

COMLINK Cloud Technical Specification Guide DEDICATED SERVER

Microsoft Windows Server Hyper-V in a Flash

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

You re not alone if you re feeling pressure

Comparison of Windows IaaS Environments

Accelerating Server Storage Performance on Lenovo ThinkServer

White Paper. Innovate Telecom Services with NFV and SDN

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Introduction to AWS Economics

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

25.2. Cloud computing, Sakari Luukkainen

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

IMPROVING RESOURCE EFFICIENCY IN CLOUD COMPUTING

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

EMC VFCACHE ACCELERATES ORACLE

OPTIMIZING SERVER VIRTUALIZATION

Cloud Data Center Acceleration 2015

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Transcription:

Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford University http://csl.stanford.edu/~christos DATE Conference March 21 st 2013

Computing is the Innovation Catalyst Science Government Commerce Healthcare Education Entertainment Faster, cheaper, greener 2

The Datacenter as a Computer Microsoft s Chicago Data Center Oct$3,$2010$ Kushagra$Vaid,$HotPower'10$ 10$ [K. Vaid, Microsoft Global Foundation Services, 2010] 3

Advantages of Large-scale Datacenters Scalable capabilities for demanding services Websearch, social nets, machine translation, cloud computing Compute, storage, networking Cost effective Low capital & operational expenses Low total cost of ownership (TCO) 4

Datacenter Scaling Cost reduction Switch to commodity servers Improved power delivery & cooling one time trick PUE < 1.15 Capability scaling More datacenters More servers per datacenter Multicore servers Scalable network fabrics >$300M per DC @60MW per DC End of voltage scaling 5

Datacenter Scaling through Resource Efficiency Are we using our current resources efficiently? Are we building the right systems to begin with? 6

Our Focus: Server Utilization Total Cost of Ownership Server utilization 14%$ 6%$ 3%$ Servers& Energy& 16%$ 61%$ Cooling& Networking& Other& [J. Hamilton, http://mvdirona.com] [U. Hoelzle and L. Barosso, 2009] Servers dominate datacenter cost CapEx and OpEx Server resources are poorly utilized CPUs cores, memory, storage 7

Low Utilization Primary reasons Diurnal user traffic & unexpected spikes Planning for future traffic growth Difficulty of designing balanced servers Higher utilization through workload co-scheduling Analytics run on front-end servers when traffic is low Spiking services overflow on servers for other services Servers with unused resources export them to other servers E.g., storage, Flash, memory So, why hasn t co-scheduling solved the problem yet? 8

Interference Poor Performance & QoS Interference on shared resources Cores, caches, memory, storage, network Large performance losses E.g. 40% for Google apps [Tang 11] QoS issue for latency-critical applications Optimized for for low 99 th percentile latency in addition to throughput Assume 1% chance of >1sec server latency, 100 servers used per request Then 63% chance of user request latency >1sec Common cures lead to poor utilization Limited resource sharing Exaggerated reservations 9

Higher Resource Efficiency wo/ QoS Loss Research agenda Workload analysis Understand resource needs, impact of interference Mechanisms for interference reduction HW & SW isolation mechanisms (e.g., cache partitioning) Interference-aware datacenter management Scheduling for min interference and max resource use Resource efficient hardware design Energy efficient, optimized for sharing Potential for >5x improvement in TCO 10

Datacenter Scheduling Apps Scheduler System State Metrics Loss Two obstacles to good performance Interference: sharing resources with other apps Heterogeneity: running on suboptimal server configuration 11

Paragon: interference-aware Scheduling [ASPLOS 13] Apps Quickly classify incoming apps For heterogeneity and interference caused/tolerated Heterogeneity & interference aware scheduling Send apps to best possible server configuration Co-schedule apps that don t interfere much Monitor & adapt App Classification Heterogeneity Interference Scheduler System State Metrics Learning Deviation from expected behavior signals error or phase change 12

Fast & Accurate Classification resources applications SVD PQ SGD SVD Interference scores Initial decomposition Reconstructed utility matrix Final decomposition Cannot afford to exhaustively analyze workloads High churn rates of evolving and/or unknown apps Classification using collaborative filtering Similar to recommendations for movies and other products Leverage knowledge from previously scheduled apps Within 1min of sparse profiling we can estimate How much interference an app causes/tolerates on each resource How well it will perform on each server type 13

Paragon Evaluation 5K apps on 1K EC2 instances (14 server types) 14

Paragon Evaluation Better performance with same resources Most workloads within 10% of ideal performance 15

Paragon Evaluation Gain Better performance with same resources Most workloads within 10% of ideal performance Can serve additional apps without the need for more HW 16

High Utilization & Latency-critical Apps Memcached latency (us) 1000 900 800 700 600 500 400 300 200 100 0 95th-% Latency % of base IPC % server util. 25% QPS 50% QPS 75% QPS 100% QPS 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 0% Total number of background processes Example: scheduling work on underutilized memcached servers Reporting QPS at cutoff of 500usec for 95 th % latency High potential for utilization improvement All the way to 100% CPU utilization impact QoS impact Several open issues System configuration, OS scheduling, management of hardware resources 17

Datacenter Scaling through Resource Efficiency Are we using our current resources efficiently? Are we building the right systems to begin with? 18

Main Memory in Datacenters Server power main energy bottleneck in datacenters PUE of ~1.1 the rest of the system is energy efficient Significant main memory (DRAM) power 25-40% of server power across all utilization points [U. Hoelzle and L. Barosso, 2009] Low dynamic range no energy proportionality 19

DDR3 Energy Characteristics DDR3 optimized for high bandwidth (1.5V, 800MHz) On chip DLLs & on-die-termination lead to high static power 70pJ/bit @ 100% utilization, 260pJ/bit at low data rates LVDDR3 alternative (1.35V, 400MHz) Lower Vdd higher on-die-termination Still disproportional at 190pJ/bit Need memory systems that consume lower energy and are proportional What metric can we trade for efficiency? 20

Memory Use in Datacenters Resource Utilization for Microsoft Services under Stress Testing [Micro 11] CPU Utilization Memory BW Utilization Disk BW Utilization Large-scale analytics 88% 1.6% 8% Search 97% 5.8% 36% Online apps rely on memory capacity, density, reliability But not on memory bandwidth Web-search and map-reduce CPU or DRAM latency bound, <6% peak DRAM bandwidth used Memory caching, DRAM-based storage, social media Overall bandwidth by network (<10% of DRAM bandwidth) We can trade off bandwidth for energy efficiency 21

Mobile DRAMs for Datacenter Servers [ISCA 12] 5x Same core, capacity, and latency as DDR3 Interface optimized for lower power & lower bandwidth ( 1 / 2 ) No termination, lower frequency, faster powerdown modes Energy proportional & energy efficient 22

Mobile DRAMs for Datacenter Servers [ISCA 12] Memory Power Search Memcached-a, b SPECPower SPECWeb SPECJbb LPDDR2 module: die stacking + buffered module design High capacity + good signal integrity 5x reduction in memory power, no performance loss Save power or increase capability in TCO neutral manner Unintended consequences Energy efficient DRAM L3 cache power now dominates 23

Summary Resource efficiency A promising approach for scalability & cost efficiency Potential for large benefits in TCO Key questions Are we using our current resources efficiently? Research on understanding, reducing, and managing interference Hardware & software Are we building the right systems to begin with? Research on new compute, memory, and storage structures 24