Towards Rack-scale Computing Challenges and Opportunities



Similar documents
Paolo Costa

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

SMB Direct for SQL Server and Private Cloud

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Hadoop: Embracing future hardware

How To Build A Cisco Ukcsob420 M3 Blade Server

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

UCS M-Series Modular Servers

Mellanox Academy Online Training (E-learning)

SeaMicro SM Server

Enabling the Use of Data

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Optimizing Data Center Networks for Cloud Computing

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Mellanox Accelerated Storage Solutions

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

Panel: Cloud/SDN/NFV 黃 仁 竑 教 授 國 立 中 正 大 學 資 工 系 2015/12/26

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

Platfora Big Data Analytics

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Router Architectures

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

VxRACK : L HYPER-CONVERGENCE AVEC L EXPERIENCE VCE JEUDI 19 NOVEMBRE Jean-Baptiste ROBERJOT - VCE - Software Defined Specialist

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

State of the Art Cloud Infrastructure

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

Getting More Performance and Efficiency in the Application Delivery Network

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Unified Computing Systems

Boost Database Performance with the Cisco UCS Storage Accelerator

Towards Predictable Datacenter Networks

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

The Future of Cloud Networking. Idris T. Vasi

Michael Kagan.

How To Build A Cloud Server For A Large Company

Introduction to Cloud Design Four Design Principals For IaaS

Scaling from Datacenter to Client

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Optimizing Web Infrastructure on Intel Architecture

Implementing a unified Datacenter architecture for service providers

Switching Solution Creating the foundation for the next-generation data center

Doubling the I/O Performance of VMware vsphere 4.1

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

VMware Virtual SAN 6.2 Network Design Guide

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Data Center and Cloud Computing Market Landscape and Challenges

Improving IT Operational Efficiency with a VMware vsphere Private Cloud on Lenovo Servers and Lenovo Storage SAN S3200

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Cisco s Massively Scalable Data Center. Network Fabric for Warehouse Scale Computer

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Business Case for BTI Intelligent Cloud Connect for Content, Co-lo and Network Providers

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

A1 and FARM scalable graph database on top of a transactional memory layer

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Block based, file-based, combination. Component based, solution based

Juniper Networks QFabric: Scaling for the Modern Data Center

Storing Big Data The Rise of the Storage Cloud

Netflix Open Connect. 2 Terabits from 6 racks

Pelican: A Building Block for Exascale Cold Data Storage

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Cisco Nexus 5000 Series Switches: Decrease Data Center Costs with Consolidated I/O

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Vblock Infrastructure Platforms 2010 Vblock Platforms Architecture Overview

White Paper Solarflare High-Performance Computing (HPC) Applications

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

Zadara Storage Cloud A

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Traditional v/s CONVRGD

Cisco UCS B440 M2 High-Performance Blade Server

Advanced Computer Networks. Datacenter Network Fabric

Blade Switches Don t Cut It in a 10 Gig Data Center

EMC XTREMIO EXECUTIVE OVERVIEW

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Introduction to VMware EVO: RAIL. White Paper

The HP IT Transformation Story

Designing and Experimenting with Data Center Architectures. Aditya Akella UW-Madison

Software-defined Storage Architecture for Analytics Computing

Cloud Data Center Acceleration 2015

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

I3: Maximizing Packet Capture Performance. Andrew Brown

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Transcription:

Towards Rack-scale Computing Challenges and Opportunities Paolo Costa paolo.costa@microsoft.com joint work with Raja Appuswamy, Hitesh Ballani, Sergey Legtchenko, Dushyanth Narayanan, Ant Rowstron

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 Goal Increase work done per dollar (CapEx + OpEx) Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 2

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 Scale out vs. scale up Many commodity servers rather than few expensive servers Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 3

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 Custom layout Remove unnecessary components (e.g., GPGPUs, USB ports) Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 4

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 Integrated fabrics Higher density and lower power consumption Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 5

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 System-on-Chip (SoC) CPU, IO controllers, NIC/fabric switch on the same die Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 6

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 Silicon Photonics High-bandwidth / low-latency interconnect (resource disaggregation) Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 7

Hardware Evolution in Data Centers 2004 2008 2010 2000 2012 2014 1 rack unit (RU) 2 Ru 4-10 Ru Rack-scale Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 8

Hardware Evolution in Data Centers Rack-scale Computers The rack is the new unit of deployment in data centers Sweet spot between single-server and cluster deployments 2000 2004 2008 2010 2012 2014 1 rack unit (RU) 2 Ru 4-10 Ru Rack-scale Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 9

Rack-scale Computer in 2020? Today s traditional rack 2020 Rack-scale Computer #Cores (#servers) ~100s (20-40) ~100,000s (1,000s) Memory ~1 TB ~100s TB Storage ~100 TB (flash + spinning disk) ~100s PB (NVM) Bandwidth / server 10 Gbps 1 Tbps Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 10

How far are we from rack-scale computing? Today s rack 2020 Rack-scale Computer 2014 Rack-scale Computer #Cores (#servers) ~100s (20-40) ~100,000s (1,000s) ~1,000s (100s-1000s) Memory ~1 TB ~100s TB ~10 TBs Storage ~100 TB (SSD/HDD) ~100s PB (NVM) ~10s PBs (SSD/HDD) Network 10 Gbps / server 1 Tbps / server ~10s -100s Gbps / server Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 11

How far are we from rack-scale computing? #Cores (#servers) Today s rack ~100s (20-40) 2020 Rack-scale Computer ~100,000s (1,000s) 2014 Rack-scale Computer ~1,000s (100s-1000s) Memory ~1 TB ~100s TB ~10 TBs Storage ~100 TB (SSD/HDD) Network 10 Gbps / server ~100s PB (NVM) 1 Tbps / server ~10s PBs (SSD/HDD) ~10s -100s Gbps / server Core count AMD SeaMicro SM15000-64 512 cores (64 servers) in 10 RU 2,048 cores (256 servers) at rack scale HP Moonshot Redstone 1,152 cores (288 servers) in 4U 11,520 cores (3,200 servers) at rack scale Boston Viridis 192 cores (48 servers) in 2 RU 7,680 cores (1,920 servers) at rack scale Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 12

How far are we from rack-scale computing? #Cores (#servers) Today s rack ~100s (20-40) 2020 Rack-scale Computer ~100,000s (1,000s) 2014 Rack-scale Computer ~1,000s (100s-1000s) Memory ~1 TB ~100s TB ~10 TBs Storage ~100 TB (SSD/HDD) Network 10 Gbps / server ~100s PB (NVM) 1 Tbps / server ~1 PB (SSD/HDD) ~10s -100s Gbps / server Memory AMD SeaMicro SM15000-XE 2 TB RAM (32 GB/server) in 10 RU 8TB RAM at rack scale HP Moonshot Redstone 1.12 TB (4 GB/server) in 4U 11.25 TB at rack scale Storage AMD SeaMicro FS-5084-L 336 TB storage in 5 RU 2.5 PB at rack scale Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 13

How far are we from rack-scale computing? #Cores (#servers) Today s rack ~100s (20-40) 2020 Rack-scale Computer ~100,000s (1,000s) 2014 Rack-scale Computer ~1,000s (100s-1000s) Memory ~1 TB ~100s TB ~10 TBs Storage ~100 TB (SSD/HDD) ~100s PB (NVM) ~1 PB (SSD/HDD) Network AMD SeaMicro SM15000-XE 1.28 Tbps fabric (20 Gbps / server) Mellanox ConnectX-3 Pro 2x 40-Gbps NICs Intel MXC Connector (expected Q3 14) Up to 32 fibers (25 Gbps / fiber) Up to 800 Gbps / server Network 10 Gbps / server 1 Tbps / server ~10s -100s Gbps / server Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 14

How far are we from rack-scale computing? #Cores (#servers) Today s rack ~100s (20-40) 2020 Rack-scale Computer ~100,000s (1,000s) 2014 Rack-scale Computer ~1,000s (100s-1000s) Memory ~1 TB ~100s TB ~10 TBs Storage ~100 TB (SSD/HDD) ~100s PB (NVM) ~1 PB (SSD/HDD) Not just quantity 3D stacking Cache-like performance for RAM? NVRAM Fast byte-addressable storage Silicon photonics Low latency (10s-100s ns at rack-scale) Network 10 Gbps / server 1 Tbps / server ~10s -100s Gbps / server Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 15

New Hardware, Old Software 2004 2008 2010 2000 2012 2014 Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 16

New Hardware, Old Software 2004 2008 2010 2000 2012 2014 MapReduce Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 17

New Hardware, Old Software 2004 2008 2010 2000 2012 2014 MapReduce Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 18

New Hardware, Old Software 2004 2008 2010 2000 2012 2014 MapReduce Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 19

New Hardware, Old Software Achieving many of the benefits of Rack-scale Computers requires adequate software support Great opportunity for system researchers to rethink the software stack and hw/sw co-design 2000 2004 2008 2010 2012 2014 MapReduce Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 20

Research Questions: Architecture What s the best usage of the silicon area? Homogenous vs. heterogeneous cores General-purpose cores vs. accelerators e.g., FPGAs, neural accelerators (NPUs) On-chip vs. off-chip functionality Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 21

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 22

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 23

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 24

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 25

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 26

Research Questions: Networking What is the correct topology? Centralized vs. distributed switch Application-agnostic vs. application specific Where to put memory/storage servers? Converged fabric How to handle memory, storage, IP traffic? Inter-rack connectivity How to extend beyond rack-scale? o over-subscription and protocol bridging Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 27

Research Questions: OS / Storage Rethink the cache-hierarchy High-performance (3D stacking) vs. high-capacity tier (NVRAM) What s the correct ratio? Are SSDs / HDDs to be used only for cold data? Impact on existing (and new!) applications? Cache-like RAM and byte-addressable fast storage How to schedule application tasks? Joint scheduling (CPU, memory, network, storage) Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 28

Research Questions: Distributed Systems RaSCs are different from many-core setups Separate failure domains, no cache coherency Rack-scale computers are distributed systems (albeit not traditional) How to handle remote resources? Consistency and fault-tolerance What are the right programming abstractions? Shared memory, message passing, MapReduce, Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 29

Rack-scale Computing @ MSR Cambridge CamCube Vision [WREN 09] Custom Routing [SIGCOMM 10] In-network Aggregation [NSDI 12] Network Abstractions [HPDC 13] Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 30

Rack-scale Computing @ MSR Cambridge Programming abstractions Storage Rack-scale design Network http://research.microsoft.com/rackscale/ Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 31

Rack-scale Computing @ MSR Cambridge Programming abstractions Storage Rack-scale design Network FaRM [NSDI 14] RDMA-based distributed platform Transaction support Lock-free reads Support for object colocation Hardware alone is not enough Software stack customization is needed High performance 167 M key lookups (31 us latency) on a 20-server testbed http://research.microsoft.com/rackscale/ Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, Miguel Castro Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 32

Rack-scale Computing @ MSR Cambridge Programming abstractions Storage Rack-scale design Network Pelican Rack-scale storage appliance for cold data Hardware and software co-design High storage density Low cost Low power consumption Fault tolerant http://research.microsoft.com/rackscale/ Austin Donnelly, Richard Black, Sergey Legtchenko, Ant Rowstron, Dave Harper, Shobana Balakrishnan, Eric Peterson, Adam Glass Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 33

Rack-scale Computing @ MSR Cambridge Programming abstractions Storage Network Rack-scale design RaSC-Net [HotCloud 14] How to design a network stack for Rack-scale computers? Routing and congestion control Support for: Multiple paths Low latency Consolidated workloads http://research.microsoft.com/rackscale/ Paolo Costa, Hitesh Ballani, Dushyanth Narayan Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 34

Rack-scale Computing @ MSR Cambridge Programming abstractions Storage Network Rack-scale design DRackAr How to master the design space? Topology, resources provisioning, Input: Hardware components Constraints (e.g., max power budget) Target workload Utility function Output: Rack configuration http://research.microsoft.com/rackscale/ Sergey Legtchenko, Ant Rowstron Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 35

Summary Rack-scale computing: 1,000s of cores TBs of RAM and PBs of storage Intra-rack high bandwidth / low latency connectivity This can improve the performance of existing apps graph processing, machine learning jobs, in-memory DBs, but also enable new ones! Call to action Hardware has been changing a lot now it s up to us to change the software too! Paolo Costa Towards Rack-scale Computing: Challenges and Opportunities 36