Delivering Scale Out Data Center Networking with Optics Why and How. Amin Vahdat Google/UC San Diego vahdat@google.com



Similar documents
Symbiosis in Scale Out Networking and Data Management. Amin Vahdat Google/UC San Diego

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

Brocade Solution for EMC VSPEX Server Virtualization

Scale and Efficiency in Data Center Networks

Optical interconnection networks for data centers

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

Scalable Data Center Networking. Amin Vahdat Computer Science and Engineering UC San Diego

Load Balancing Mechanisms in Data Center Networks

SummitStack in the Data Center

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Michael Kagan.

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Optics Technology Trends in Data Centers

N5 NETWORKING BEST PRACTICES

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Advanced Computer Networks. Datacenter Network Fabric

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Data Center Network Topologies: FatTree

Photonic Networks for Data Centres and High Performance Computing

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Scala Storage Scale-Out Clustered Storage White Paper

Switching Solution Creating the foundation for the next-generation data center

Low Cost 100GbE Data Center Interconnect

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

How To Write An Article On An Hp Appsystem For Spera Hana

Why 25GE is the Best Choice for Data Centers

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

T. S. Eugene Ng Rice University

Cloud Computing Is In Your Future

Ethernet Alliance Panel #2: Bandwidth Growth and The Next Speed of Ethernet

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Oracle Virtual Networking Overview and Frequently Asked Questions March 26, 2013

Portland: how to use the topology feature of the datacenter network to scale routing and forwarding

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Isilon IQ Network Configuration Guide

Optical interconnects in data centers

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

SummitStack in the Data Center

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat

Architecting Low Latency Cloud Networks

SMB Direct for SQL Server and Private Cloud

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Technical Brief: Egenera Taps Brocade and Fujitsu to Help Build an Enterprise Class Platform to Host Xterity Wholesale Cloud Service

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

How To Make A Data Center More Efficient

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Networking Topology For Your System

Cisco s Massively Scalable Data Center

The best platform for building cloud infrastructures. Ralf von Gunten Sr. Systems Engineer VMware

Deploying in a Distributed Environment

Optical Networking for Data Centres Network

SPEED your path to virtualization.

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Hadoop: Embracing future hardware

Enabling the Use of Data

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

D1.2 Network Load Balancing

Unified Computing Systems

VXRACK SYSTEM Product Overview DATA SHEET

Flexible SDN Transport Networks With Optical Circuit Switching

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Microsoft s Cloud Networks

BUILDING A NEXT-GENERATION DATA CENTER

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Enabling Technologies for Distributed Computing

SX1024: The Ideal Multi-Purpose Top-of-Rack Switch

Netvisor Software Defined Fabric Architecture

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Cisco Wide Area Application Services (WAAS) Appliances

Load Balancing in Data Center Networks

Resource Efficient Computing for Warehouse-scale Datacenters

Sun Constellation System: The Open Petascale Computing Architecture

Data Center Networking Designing Today s Data Center

10G CWDM Conversion Technology

Transcription:

Delivering Scale Out Data Center Networking with Optics Why and How Amin Vahdat Google/UC San Diego vahdat@google.com

Vignette 1: Explosion in Data Center Buildout

Motivation Blueprints for 200k sq. ft. Data Center in OR

San Antonio Data Center

Chicago Data Center

Dublin Data Center

All Filled with Commodity Computation and Storage

Why the Cloud? Compute/storage 100x cheaper to operate at scale Equipment costs 2-20x cheaper at scale Energy 10-20x cheaper at scale Unless ms-level latency critical, likely more efficient to perform computation and store data in the cloud Inter-cluster bandwidth requirements >>100x widearea Internet bandwidth

Network Design Goals Scalable interconnection bandwidth Full bisection bandwidth between all pairs of hosts Aggregate bandwidth = # hosts host NIC capacity Economies of scale Price/port constant with number of hosts Must leverage commodity merchant silicon Anything anywhere Don t let the network limit benefits of virtualization Management Modular design Avoid actively managing 100 s-1000 s network elements

Scale Out Networking Advances toward scale out computing and storage Aggregate computing and storage grows linearly with the number of commodity processors and disks Small matter of software to enable functionality Alternative is scale up where weaker processors and smaller disks are replaced with more powerful parts Today, no technology for scale out networking Modules to expand number of ports or aggr BW No management of individual switches, VLANs, subnets

The Future Internet Applications and data will be partitioned and replicated across multiple data centers 99% of compute, storage, communication will be inside the data center Data Center Bandwidth Exceeds that of the Access Data sizes will continue to explode From click streams, to scientific data, to user audio, photo, and video collections Individual user requests and queries will run in parallel on thousands of machines Back end analytics and data processing will dominate

Implications of Cloud Computing on Internet Architecture How do we move from delivering switches to delivering fabrics Cisco, Juniper, HP, Brocade, Should storage consist of SANs and centralized servers? EMC, Network Appliance, What role will optics play in future data centers? (your company name here)

Emerging Rack Architecture Core Cache Core Cache Core Cache Core Cache DDR3-1600 100Gb/s 5ns PCIe 3.0 x16 DRAM (10 s GB) TBs storage 24-port 40 GigE switch 150 ns latency L3 Cache 128Gb/s ~250ns 2x10GigE NIC Can we leverage emerging merchant switch and newly proposed optical transceivers and switches to treat entire data center as single logical computer

Amdahl s (Lesser Known) Law Balanced Systems for parallel computing For every 1Mhz of processing power must have 1MB of memory 1 Mbit/sec I/O In the late 1960 s Fast forward to 2012 4x2.5Ghz processors, 8 cores 30-60Ghz of processing power (not that simple!) 24-64GB memory But 1Gb/sec of network bandwidth?? Deliver 40 Gb/s bandwidth to 100k servers? 4 Pb/sec of bandwidth required today

Targets 100k server data center ~1M cores, 1 PB DRAM, 1 EB Storage Require: 4 Pb/sec network switch to make data center appear as uniform computer <500 ns latency to any host in the data center Can we effect 40x BW increase per host? From each rack: 1.6Tb/sec of traffic From each row: 20Tb/sec of traffic If $100 per SFP+ transceiver plus fiber, then >$3000/ server just for interconenct in 7-hop network Building scale 40Gb/s means $2000 transceivers?

Balanced Systems Really Do Matter Balancing network and I/O results in huge efficiency improvements How much is a factor of 100 improvement worth in terms of cost? TritonSort: A Balanced Large-scale Sorting System, Rasmussen, et al., NSDI 2011. System! Duration! Aggr. Rate! Servers! Rate/server! Yahoo (100TB)! 173 min! 9.6 GB/s! 3452! 2.8 MB/s! TritonSort (100TB)! 107 min! 15.6 GB/s! 52! 300 MB/s!

TritonSort Results Hardware HP DL-380 2U servers, 8-2.5 Ghz cores, 24 GB RAM, 16x500-GB disks, 2x10 Gb/s Myricom NICs 52-port Cisco Nexus 5020 switch Results 2010 GraySort: 100 TB in 123 mins/48 nodes, 2.3 Gb/s/server MinuteSort: 1014 GB in 59 secs/52 nodes, 2.6 Gb/s/server Results 2011 GraySort: 100 TB in 107 mins/52 nodes, 2.4 Gb/s/server MinuteSort: 1353 GB in 1 min/52 nodes, 3.5 Gb/s/server JouleSort: 9700 records/joule

Driver: Nonblocking Multistage Datacenter Topologies M. Al-Fares, A. Loukissas, A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM 08. k=4,n=3

Problem - 10 Tons of Cabling 55,296 Cat-6 cables 1,128 separate cable bundles If optics used for transport, transceivers are ~80% of cost of interconnect The Yellow Wall

Data Center Networking Research Switch Architecture [SIGCOMM 08] Cabling, Merchant Silicon [Hot Interconnects 09] Virtualization, Layer 2, Management [SIGCOMM 09,SOCC11a] Routing/Forwarding [NSDI 10] Hybrid Optical/Electrical Switch [SIGCOMM 10,SOCC11b] Energy, applications, transport [ongoing] Take advantage of regularity of fat tree structure to simplify protocol design and improve performance Generalize to arbitrary topologies

Vignette 2: Optics in the Data Center

Fiber Optic Communication Technologies: What s Needed for Datacenter Network Operations Cedric F. Lam, et. al, Google Inc. IEEE Communications Magazine July 2010 Ideal: network connects every server to every other server Application software not concerned with locality Problem: prohibitively expensive. Integration to improve port density and system throughput Power of the modules should also scale so that the total power dissipation per unit space does not change Datacenter operators request to optics: scale speed up by a factor of 4 with same power and space Oh, and reduce cost by a factor of 10 All that can be offered is a factor of 100x in volume?

A Modular Data Center Question: How to use optics to switch between the boxes? Illustration: Bryan Christie Design From: R. Katz, Tech Titans Building Boom. IEEE Spectrum, Feb. 2009

Hybrid Optical/Electric Switching? Electrical Packet Switch $500/port 12 W/port 10 Gb/s fixed Per-packet switching Optical Circuit Switch $500/port 240 mw/port Rate agnostic 12 ms switching time Mixing both types of switches in the same network allows one type of switch to compensate for the weaknesses of the other type. The optimal ratio of packet switches and circuit switches depends on the data center workload. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers, Farrington, Porter, Radhakrishnan, Bazzaz, Subramanya, Fainman, Papen, Vahdat, Proceedings of ACM SIGCOMM 2010.

k Traditional N-port Electrical Core Packet Switches Bisection Bandwidth Cost Example: N=64 pods; k=1024 hosts/pod = 64K hosts total; 8 wavelengths Power 10% Static (10:1 Oversubscribed) $6.3 M 96.5 kw Cables 6,656 N Electrical k-port Pod Packet Switches 2010-07-19 25 100% Static Helios Example 10% Static + 90% Dynamic

k Traditional N-port Electrical Core Packet Switches Bisection Bandwidth N Electrical k-port Pod Packet Switches Example: N=64 pods; k=1024 hosts/pod = 64K hosts total; 8 wavelengths 10% Static (10:1 Oversubscribed) Cost $6.3 M $62.3 M Power 96.5 kw 950.3 kw Cables 6,656 65,536 2010-07-19 26 100% Static Helios Example 10% Static + 90% Dynamic

Traditional N-port Electrical Core Packet Switches + Optical Core Circuit Switches Fewer Core Switches Bisection Bandwidth N Electrical k-port Pod Packet Switches Example: N=64 pods; k=1024 hosts/pod = 64K hosts total; 8 wavelengths 10% Static (10:1 Oversubscribed) 100% Static Helios Example 10% Static + 90% Dynamic Cost $6.3 M $62.2 M $22.1 M 2.8x Less Power 96.5 kw 950.3 kw 157.2 kw 6.0x Less Cables 6,656 65,536 14,016 4.7x Less 2010-07-19 27

Superlinks Each 10 Gb/s transceiver in a LAG* uses a non-overlapping wavelength. This LAG, called a superlink, can fit onto a single fiber pair. Superlinks are transparent to the pod switches, which deal only with LAGs. Optical Circuit Switch WDM MUX WDM DEMUX 1 2 3 4 5 6 7 8 Pod Switch * IEEE 802.1AX-2008 Link Aggregation Group

Simulation Study @ 10Gb/s Conclusion: Most benefit (cost, power, # ports) comes for aggregating 4-8 wavelengths Less expensive coarse WDM can be used.

Helios Example: Aggregation in Action Core Circuit Switch WDM DEMUX Pod 1 Pod 2 Pod 3 Pod 4

Helios Example: Small HTTP Request Core Circuit Switch Small HTTP Request Pod 1 Core Packet Switch Pod 2

Helios Example: Large VM Migration Core Circuit Switch Establish Circuit b/w Pod 1 and 2 Large VM Migration Identify as An Elephant Flow Pod 1 Core Packet Pod 2 Switch

Optical Requirements for the Data Center

Optical Circuit Switches Lower cost Lots of opportunities to insert OCS into data center, but $500/port is simply not compelling Larger scale 100-300 ports is typically inadequate for data center applications Lower insertion loss 3-5dB precludes many deployment scenarios Faster switching times Hardware and software, 20 ms is a starting point but targeting 10-100 us could be transformative

WDM Transceivers VCSEL+MMF nonstarter for PbàXb networks Power consumption Electrical switch density limited by transceiver power Optical link budget Light will pass through multiple (passive) devices at distances approaching 1-2km Bandwidth LR-4 just now becoming available at too high a cost Need 10x10, 12x10, 16x10 for effective muxing of 40GE Spectral Efficiency Within a building, trade spectral efficiency for cost/power

Conclusions Vast majority of computation, storage, and communication will take place in data centers in the years ahead Building balanced systems today requires Pb/sec networks. Exabit/sec networks are not far behind Tremendous opportunity for optics to define what it means to perform communication in the data center