Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu lingu@ieee.org

Similar documents
Lecture 7: Data Center Networks"

Data Center Network Topologies: FatTree

Cisco s Massively Scalable Data Center

Non-blocking Switching in the Cloud Computing Era

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

Optimizing Data Center Networks for Cloud Computing

Demonstrating the high performance and feature richness of the compact MX Series

Scaling 10Gb/s Clustering at Wire-Speed

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Brocade Solution for EMC VSPEX Server Virtualization

Load Balancing in Data Center Networks

Advanced Computer Networks. Datacenter Network Fabric

Addressing Scaling Challenges in the Data Center

InfiniBand Clustering

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

A Reliability Analysis of Datacenter Topologies

LAYER3 HELPS BUILD NEXT GENERATION, HIGH-SPEED, LOW LATENCY, DATA CENTER SOLUTION FOR A LEADING FINANCIAL INSTITUTION IN AFRICA.

Chapter 1 Reading Organizer

Multihoming and Multi-path Routing. CS 7260 Nick Feamster January

Data Center Convergence. Ahmad Zamer, Brocade

A Link Load Balancing Solution for Multi-Homed Networks

VMDC 3.0 Design Overview

Data Center Networking Designing Today s Data Center

TRILL Large Layer 2 Network Solution

Cisco s Massively Scalable Data Center. Network Fabric for Warehouse Scale Computer

Software-Defined Networks Powered by VellOS

Local-Area Network -LAN

NEC s Juniper Technology Brief Issue 2

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Data Center Fabrics What Really Matters. Ivan Pepelnjak NIL Data Communications

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

Definition. A Historical Example

Network Virtualization for Large-Scale Data Centers

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Virtual PortChannels: Building Networks without Spanning Tree Protocol

B4: Experience with a Globally-Deployed Software Defined WAN TO APPEAR IN SIGCOMM 13

Core and Pod Data Center Design

Multi-site Datacenter Network Infrastructures

SummitStack in the Data Center

Redefine Network Visibility in the Data Center with the Cisco NetFlow Generation Appliance

VMware Virtual SAN 6.2 Network Design Guide

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

The Road to Cloud Computing How to Evolve Your Data Center LAN to Support Virtualization and Cloud

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Troubleshooting and Maintaining Cisco IP Networks Volume 1

Brocade One Data Center Cloud-Optimized Networks

Multi-Chassis Trunking for Resilient and High-Performance Network Architectures

Increase Simplicity and Improve Reliability with VPLS on the MX Series Routers

Malaysia State Government Data Center Construction

Router Architectures

Network Architecture and Topology

NETWORKING FOR DATA CENTER CONVERGENCE, VIRTUALIZATION & CLOUD. Debbie Montano, Chief Architect dmontano@juniper.net

White Paper. Network Simplification with Juniper Networks Virtual Chassis Technology

Introduction to Cloud Design Four Design Principals For IaaS

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Transform Your Business and Protect Your Cisco Nexus Investment While Adopting Cisco Application Centric Infrastructure

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Juniper Networks QFabric: Scaling for the Modern Data Center

Large Scale Clustering with Voltaire InfiniBand HyperScale Technology

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

Microsoft s Cloud Networks

Portland: how to use the topology feature of the datacenter network to scale routing and forwarding

Data Center Use Cases and Trends

Leveraging Advanced Load Sharing for Scaling Capacity to 100 Gbps and Beyond

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Business Case for BTI Intelligent Cloud Connect for Content, Co-lo and Network Providers

Next Steps Toward 10 Gigabit Ethernet Top-of-Rack Networking

Making the Internet fast, reliable and secure. DE-CIX Customer Summit Steven Schecter <schecter@akamai.com>

Data Center Networks

TechBrief Introduction

Ethernet Fabrics: An Architecture for Cloud Networking

Advanced Computer Networks. Scheduling

WHITE PAPER. Copyright 2011, Juniper Networks, Inc. 1

Scalable Approaches for Multitenant Cloud Data Centers

Fibre Channel over Ethernet in the Data Center: An Introduction

Networking in the Hadoop Cluster

WHITE PAPER Ethernet Fabric for the Cloud: Setting the Stage for the Next-Generation Datacenter

Border Gateway Protocol (BGP)

RFC 2547bis: BGP/MPLS VPN Fundamentals

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

Simplify Your Data Center Network to Improve Performance and Decrease Costs

MPLS/BGP Network Simulation Techniques for Business Enterprise Networks

DEMYSTIFYING ROUTING SERVICES IN SOFTWAREDEFINED NETWORKING

How to Monitor a FabricPath Network

Flexible Modular Data Center Architecture Simplifies Operations

DD2491 p Load balancing BGP. Johan Nicklasson KTHNOC/NADA

Software Defined Networks

Data Center Switch Fabric Competitive Analysis

MPLS is the enabling technology for the New Broadband (IP) Public Network

HP FlexNetwork and IPv6

Network Configuration Example

OpenFlow based Load Balancing for Fat-Tree Networks with Multipath Support

Transcription:

Large-Scale Distributed Systems Datacenter Networks COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org

Datacenter Networking Major Components of a Datacenter Computing hardware (equipment racks) Power supply and distribution hardware Cooling hardware and cooling fluid distribution hardware Network infrastructure IT Personnel and office equipment

Datacenter Networking Growth Trends in Datacenters Load on network & servers continues to rapidly grow Rapid growth: a rough estimate of annual growth rate: enterprise datacenters: ~35%, Internet datacenters: 50% - 100% Information access anywhere, anytime, from many devices Desktops, laptops, PDAs & smart phones, sensor networks, proliferation of broadband Mainstream servers moving towards higher speed links 1-GbE to10-gbe in 2008-2009 10-GbE to 40-GbE in 2010-2012 High-speed datacenter-man/wan connectivity High-speed datacenter syncing for disaster recovery

Datacenter Networking A large part of the total cost of the DC hardware Large routers and high-bandwidth switches are very expensive Relatively unreliable many components may fail. Many major operators and companies design their own datacenter networking to save money and improve reliability/scalability/performance. The topology is often known The number of nodes is limited The protocols used in the DC are known Security is simpler inside the data center, but challenging at the border We can distribute applications to servers to distribute load and minimize hot spots

Datacenter Networking Networking components (examples) 64 10-GE port Upstream 768 1-GE port Downstream High Performance & High Density Switches & Routers Scaling to 512 10GbE ports per chassis No need for proprietary protocols to scale Highly scalable DC Border Routers 3.2 Tbps capacity in a single chassis 10 Million routes, 1 Million in hardware 2,000 BGP peers 2K L3 VPNs, 16K L2 VPNs High port density for GE and 10GE application connectivity Security

Datacenter Networking Common data center topology Internet Core Layer-3 router Datacenter Aggregation Layer-2/3 switch Access Layer-2 switch Servers

Data center network design goals High network bandwidth, low latency Reduce the need for large switches in the core Simplify the software, push complexity to the edge of the network Improve reliability Datacenter Networking Reduce capital and operating cost

Datacenter Networking Avoid this and simplify this

Interconnect Can we avoid using high-end switches? Expensive high-end switches to scale up Single point of failure and bandwidth bottleneck Experiences from real systems One answer: DCell? 9

Interconnect DCell Ideas #1: Use mini-switches to scale out #2: Leverage servers to be part of the routing infrastructure Servers have multiple ports and need to forward packets #3: Use recursion to scale and build complete graph to increase capacity

Data Center Networking One approach: switched network with a hypercube interconnect Leaf switch: 40 1Gbps ports+2 10 Gbps ports. One switch per rack. Not replicated (if a switch fails, lose one rack of capacity) Core switch: 10 10Gbps ports Form a hypercube Hypercube high-dimensional rectangle

Interconnect Hypercube properties Minimum hop count Even load distribution for all-all communication. Can route around switch/link failures. Simple routing: Outport = f(dest xor NodeNum) No routing tables

Interconnect A 16-node (dimension 4) hypercube 3 3 3 3 2 0 0 0 1 2 2 5 0 0 4 2 1 1 1 1 1 1 1 1 2 2 0 0 3 2 2 7 0 0 6 2 3 3 3 3 2 3 3 3 3 10 0 0 11 2 2 15 0 0 14 2 1 1 1 1 2 1 1 1 1 8 0 0 9 2 2 13 0 0 12 3 3 3 3 2

64-switch Hypercube 4X4 Sub-cube 4X4 Sub-cube 63 * 4 links to other containers 16 links 16 links 16 links 4X4 Sub-cube 16 links 4X4 Sub-cube 4 links One container: Level 2: 2 10-port 10 Gb/sec switches Level 1: 8 10-port 10 Gb/sec switches Level 0: 32 40-port 1 Gb/sec switches Interconnect Core switch: 10Gbps port x 10 16 10 Gb/sec links 64 10 Gb/sec links How many servers can be connected in this system? 81920 servers with 1Gbps bandwidth Leaf switch: 1Gbps port x 40 + 10Gbps port x 2. 1280 Gb/sec links

Data Center Networking The Black Box

Appendix 16

Typical layer 2 & Layer 3 in existing systems Layer 2 One spanning tree for entire network Prevents looping Ignores alternate paths Layer 3 Shortest path routing between source and destination Best-effort delivery Interconnect

Interconnect Problem With common DC topology Single point of failure Over subscription of links higher up in the topology Trade off between cost and provisioning Layer 3 will only use one of the existing equal cost paths Packet re-ordering occurs if layer 3 blindly takes advantage of path diversity

Interconnect Fat-tree based Solution Connect hosts together using a fat-tree topology Infrastructure consists of cheap devices Each port supports same speed as the end host All devices can transmit at line speed if packets are distributed along existing paths A k-ary fat-tree is composed of switches with k ports How many switches? 5k 2 /4 How many connected hosts? k 3 /4

Interconnect k-ary Fat-Tree (k=4) k 2 /4 switches k pods k/2 switches/pod k/2 switches/pod k/2 hosts/edge switch Use the same type of switches in the core, aggregation, and edge, with each switch having k ports

Fat-tree Modified Interconnect Enforce special addressing scheme in DC Allows host attached to same switch to route only through switch Allows inter-pod traffic to stay within pod unused.podnumber.switchnumber.endhost Use two level look-ups to distribute traffic and maintain packet ordering.

2-Level look-ups Interconnect First level is prefix lookup Used to route down the topology to endhost Second level is a suffix lookup Used to route up towards core Diffuses and spreads out traffic Maintains packet ordering by using the same ports for the same endhost

Interconnect Comparison of several schemes Hypercube: high-degree interconnect for large net, difficult to scale incrementally Butterfly and fat-tree: cannot scale as fast as DCell De Bruijn: cannot incrementally expand DCell: low bandwidth between two clusters (sub-dcells)