Scalable Data Center Networking. Amin Vahdat Computer Science and Engineering UC San Diego vahdat@cs.ucsd.edu



Similar documents
Advanced Computer Networks. Datacenter Network Fabric

Scale and Efficiency in Data Center Networks

Data Center Network Topologies: FatTree

Lecture 7: Data Center Networks"

Delivering Scale Out Data Center Networking with Optics Why and How. Amin Vahdat Google/UC San Diego

Non-blocking Switching in the Cloud Computing Era

TRILL Large Layer 2 Network Solution

Introduction to Cloud Design Four Design Principals For IaaS

Symbiosis in Scale Out Networking and Data Management. Amin Vahdat Google/UC San Diego

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Ethernet Fabrics: An Architecture for Cloud Networking

Large Scale Clustering with Voltaire InfiniBand HyperScale Technology

Topological Properties

Load Balancing Mechanisms in Data Center Networks

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

Deploying Ceph with High Performance Networks, Architectures and benchmarks for Block Storage Solutions

VMDC 3.0 Design Overview

Hadoop Cluster Applications

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

SummitStack in the Data Center

SummitStack in the Data Center

VMware Virtual SAN 6.2 Network Design Guide

Scala Storage Scale-Out Clustered Storage White Paper

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Hadoop Architecture. Part 1

Router Architectures

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

Brocade Solution for EMC VSPEX Server Virtualization

Architecting Low Latency Cloud Networks

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

Bringing the Public Cloud to Your Data Center

T. S. Eugene Ng Rice University

Parallel Computing. Benson Muite. benson.

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Load Balancing in Data Center Networks

40GBASE-T Advantages and Use Cases

Cisco Unified Computing System: Meet the Challenges of Virtualization with Microsoft Hyper-V

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Hedera: Dynamic Flow Scheduling for Data Center Networks

Optimizing Data Center Networks for Cloud Computing

SAN Conceptual and Design Basics

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Networking in the Hadoop Cluster

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Migration of Virtual Machines for Better Performance in Cloud Computing Environment

Building Storage Service in a Private Cloud

CSE-E5430 Scalable Cloud Computing Lecture 2

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Top of Rack: An Analysis of a Cabling Architecture in the Data Center

Scaling 10Gb/s Clustering at Wire-Speed

Data Center Switch Fabric Competitive Analysis

Switching Solution Creating the foundation for the next-generation data center

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Local-Area Network -LAN

The Future of Cloud Networking. Idris T. Vasi

Paolo Costa

Extended Distance SAN with MC/ServiceGuard Opens New Disaster Recovery Opportunities

Lecture 2 Parallel Programming Platforms

Energy Efficient MapReduce

A1 and FARM scalable graph database on top of a transactional memory layer

A Link Load Balancing Solution for Multi-Homed Networks

Grid Computing Vs. Cloud Computing

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

HadoopRDF : A Scalable RDF Data Analysis System

Software Define Storage (SDs) and its application to an Openstack Software Defined Infrastructure (SDi) implementation

Panel: Cloud/SDN/NFV 黃 仁 竑 教 授 國 立 中 正 大 學 資 工 系 2015/12/26

Lecture 02a Cloud Computing I

Big Data and Apache Hadoop s MapReduce

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0

How To Design A Data Centre

Juniper Networks QFabric: Scaling for the Modern Data Center

Hadoop on the Gordon Data Intensive Cluster

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Expert Reference Series of White Papers. Planning for the Redeployment of Technical Personnel in the Modern Data Center

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

MapReduce (in the cloud)

Scalable Approaches for Multitenant Cloud Data Centers

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

Photonic Switching Applications in Data Centers & Cloud Computing Networks

OpenFlow based Load Balancing for Fat-Tree Networks with Multipath Support

Lecture 1. Lecture Overview. Intro to Networking. Intro to Networking. Motivation behind Networking. Computer / Data Networks

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

Cisco Nexus 5000 Series Switches: Decrease Data Center Costs with Consolidated I/O

OPTICAL TRANSPORT NETWORKS

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Increasing Flash Throughput for Big Data Applications (Data Management Track)

Transcription:

Scalable Data Center Networking Amin Vahdat Computer Science and Engineering UC San Diego vahdat@cs.ucsd.edu

Center for Networked Systems 20 Across CSE, ECE, and SDSC

CNS Project Formation Member Companies Center Faculty Research Interests Project Proposals Diverse Research Projects - Multiple faculty - Multiple students - Multidisciplinary - CNS Research Theme 3

An Extraordinarily Brief History of Communication 3500-2900BC: various inventions of alphabet 900BC: first postal service in China 776BC: first recorded use of homing pigeons to send messages 530BC: first library ~500BC: papyrus, portable and light writing surface 37: first optical network, Romans use mirrors 305: first wooden printing press in China 1455: first printing press with metal movable type 1831: electric telegraph 1876: telephone invented

Source: Livinginternet.com Vannevar Bush Summary: Vannevar Bush established the U.S. military / university research partnership that later developed the ARPANET. Quote: Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. It consists of a desk, and while it can presumably be operated from a distance, it is primarily the piece of furniture at which he works. On the top are slanting translucent screens, on which material can be projected for convenient reading. There is a keyboard, and sets of buttons and levers. Otherwise it looks like an ordinary desk. Vannevar Bush; As We May Think; Atlantic Monthly; July 1945

J. C. R. Licklider Summary: Joseph Carl Robnett "Lick" Licklider developed the idea of a universal network, spread his vision throughout the IPTO, and inspired his successors to realize his dream by creation of the ARPANET. Quote: It seems reasonable to envision, for a time 10 or 15 years hence, a 'thinking center' that will incorporate the functions of present-day libraries together with anticipated advances in information storage and retrieval. The picture readily enlarges itself into a network of such centers, connected to one another by wide-band communication lines and to individual users by leasedwire services. In such a system, the speed of the computers would be balanced, and the cost of the gigantic memories and the sophisticated programs would be divided by the number of users. - J.C.R. Licklider, Man-Computer Symbiosis, 1960. Source: Livinginternet.com

1969 Internet Map

Back to the Future: Cloud Computing Personal computing revolution in the 1980 s led to a PC on every desktop Client/server computing to control distribution of data Management, energy, security, consistency quickly overwhelmed the cost of the hardware Bursty resource requirements led to 1-10% utilization Berkeley NOW: use idle cycles to build a supercomputer Trends and enabling technologies Utility computing, Software as a Service (SaaS) Ubiquitous wireless coverage, multi-gigabit optical pipes, virtualization, malware/botnets

Cloud Computing Third party companies provide storage and computing on demand Statistical multiplexing and virtualization enables efficient utilization of underlying resources Companies and individuals pay only for what they consume Applications, operating systems centrally managed Data/applications available from a variety of devices and in a variety of places Automatically backed up, made consistent

Cloud Computing@UCSD WebOS: Rent-A-Server [HPDC98] Continuous Consistency in support of replication TACT [OSDI00, SOSP01,TOCS02,TOCS04] Virtualization Virtual clusters [LISA07] Memory management [OSDI08] Large scale testing DieCast [NSDI06,NSDI08] PlanetLab/GENI Resource Peering [SOSP03] Workload characterization [USENIX06] Service Discovery [HPDC05] Plush application management [LISA07]

Cloud Computing: Two Questions Starting point: computing and storage increasingly delivered by dense data centers How to program multi-data center applications? Bottom line: applications built on top of data structures How do you partition and replicate data structures across and within data centers? For target levels of performance, availability, consistency How to interconnect individual data centers? 100,000+ ports within single data center, 10 Gb/s per port How to build a petabit/sec non-blocking switch?

Life of a Social Networking Request 120M+ users organized into a graph Incoming request for user Alice Cookie hashes to handle for Alice s profile Retrieve information from Alice s profile Picture, status, handles to friends, location, etc. Retrieve information from friends profiles Recent information from queues Retrieve recent information from news feeds linked lists Each request maps to ~1,000 machines

Life of a Social Networking Request: Backend Petabytes of data generated in form of click-streams Significant amount of user data to be indexed Advertising placement based on user access patterns and user profiles Large-scale data processing effort to appropriately process data Emerging data processing model: MapReduce All-to-all communication among tens of thousands of machines

Scalable Data Center Networking

Motivation Commoditization in the data center Inexpensive, commodity PCs and storage devices But network still highly specialized Data center is not a small Internet One admin domain, not adversarial, limited policy routing, etc. Bandwidth is often the bottleneck Cloud Computing Service-oriented Architectures Data Analysis (MapReduce)

Network Design Goals Scalable interconnection bandwidth Full bisection bandwidth between all pairs of hosts Aggregate bandwidth = # hosts host NIC capacity Economies of scale Price/port linear with number of hosts Single network fabric Support Ethernet and IP without end host modifications Management Modular design Avoid actively managing 100 s-1000 s network elements

Current Data Center Topologies Edge hosts connect to 1G Top of Rack (ToR) switch ToR switches connect to 10G End of Row (EoR) switches Large clusters: EoR switches to 10G core switches Oversubscription of 2.5:1 to 8:1 typical in guidelines No story for what happens as we move to 10G to the edge Core EoR ToR Key challenges: performance, cost, routing, energy, cabling

Data Center Network Economics 10x commodity edge switches $100/end host Low margins 1x commodity core switches $1,000-$4000/end host High margins

Force 10 Study: Data Center Pricing $4,000/port for switches in 1,000 node data center! Taken from The FORCE10 Networks TeraScale E-Series brochure

Cost of Data Center Networks Cost (USD millions) $30 $25 $20 $15 $10 $5 $0 100% BW 33% BW Fat-Tree (100% BW) 0 5000 10000 15000 20000 25000 Hosts Factor of 10+ price difference between traditional approach and proposed architecture

Scalability Using Identical Network Elements Core Pod 0 Pod 1 Pod 2 Fat tree built from 4-port switches Pod 3

Scalability Using Identical Network Elements Core Pod 0 Pod 1 Pod 2 Support 16 hosts organized into 4 pods Each pod is a 2-ary 2-tree Pod 3 Full bandwidth among hosts directly connected to pod

Scalability Using Identical Network Elements Core Pod 0 Pod 1 Pod 2 Pod 3 Full bisection bandwidth at each level of fat tree Rearrangeably Nonblocking Entire fat-tree is a 2-ary 3-tree

Scalability Using Identical Network Elements Core Pod 0 Pod 1 Pod 2 Pod 3 (5k 2 /4) k-port switches support k 3 /4 hosts 48-port switches: 27,648 hosts using 2,880 switches Critically, approach scales to 10 GigE at the edge

Scalability Using Identical Network Elements Core Pod 0 Pod 1 Pod 2 Pod 3 Regular structure simplifies design of network protocols Opportunities: performance, cost, energy, fault tolerance, incremental scalability, etc.

Why Hasn t This Done Before? Needs to be backward compatible with IP/Ethernet Existing routing protocols do not work for fat tree Cabling explosion at each level of the fat tree Tens of thousands of cables running across data center? Management Thousands of individual elements that must be programmed individually