PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric



Similar documents
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat

Portland: how to use the topology feature of the datacenter network to scale routing and forwarding

Scale and Efficiency in Data Center Networks

Advanced Computer Networks. Datacenter Network Fabric

BURSTING DATA BETWEEN DATA CENTERS CASE FOR TRANSPORT SDN

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Load Balancing Mechanisms in Data Center Networks

Data Center Network Topologies: FatTree

Chapter 6. Paper Study: Data Center Networking

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

OpenFlow based Load Balancing for Fat-Tree Networks with Multipath Support

Data Center Load Balancing Kristian Hartikainen

Adaptive Routing for Layer-2 Load Balancing in Data Center Networks

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

GUI Tool for Network Designing Using SDN

TRILL Large Layer 2 Network Solution

What is VLAN Routing?

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

Scaling 10Gb/s Clustering at Wire-Speed

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Virtual PortChannels: Building Networks without Spanning Tree Protocol

LAYER3 HELPS BUILD NEXT GENERATION, HIGH-SPEED, LOW LATENCY, DATA CENTER SOLUTION FOR A LEADING FINANCIAL INSTITUTION IN AFRICA.

Lecture 7: Data Center Networks"

Ethernet-based Software Defined Network (SDN)

Load Balancing in Data Center Networks

Cisco s Massively Scalable Data Center

Data Center Networking

Hedera: Dynamic Flow Scheduling for Data Center Networks

STATE OF THE ART OF DATA CENTRE NETWORK TECHNOLOGIES CASE: COMPARISON BETWEEN ETHERNET FABRIC SOLUTIONS

AIN: A Blueprint for an All-IP Data Center Network

Torii-HLMAC: Torii-HLMAC: Fat Tree Data Center Architecture Elisa Rojas University of Alcala (Spain)

DEMYSTIFYING ROUTING SERVICES IN SOFTWAREDEFINED NETWORKING

ConnectX -3 Pro: Solving the NVGRE Performance Challenge

Non-blocking Switching in the Cloud Computing Era

Ethernet Fabrics: An Architecture for Cloud Networking

vsphere Networking ESXi 5.0 vcenter Server 5.0 EN

Data Center Network Topologies

Brocade One Data Center Cloud-Optimized Networks

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

How To Manage A Virtualization Server

Data Center Networking Designing Today s Data Center

Fibre Channel over Ethernet in the Data Center: An Introduction

Network Virtualization for Large-Scale Data Centers

Extending Networking to Fit the Cloud

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Applying NOX to the Datacenter

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

Virtual Machine Manager Domains

VXLAN: Scaling Data Center Capacity. White Paper

Chapter 14: Distributed Operating Systems

Lecture 02b Cloud Computing II

VMDC 3.0 Design Overview

RESILIENT NETWORK DESIGN

EVOLVING ENTERPRISE NETWORKS WITH SPB-M APPLICATION NOTE

Objectives. The Role of Redundancy in a Switched Network. Layer 2 Loops. Broadcast Storms. More problems with Layer 2 loops

全 新 企 業 網 路 儲 存 應 用 THE STORAGE NETWORK MATTERS FOR EMC IP STORAGE PLATFORMS

Addressing Scaling Challenges in the Data Center

Switching in an Enterprise Network

CHAPTER 10 LAN REDUNDANCY. Scaling Networks

Chapter 16: Distributed Operating Systems

Multi-Chassis Trunking for Resilient and High-Performance Network Architectures

Enabling Flow-based Routing Control in Data Center Networks using Probe and ECMP

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Chapter 3. Enterprise Campus Network Design

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

Simplify Your Data Center Network to Improve Performance and Decrease Costs

TRILL for Data Center Networks

Cisco s Massively Scalable Data Center. Network Fabric for Warehouse Scale Computer

Why Software Defined Networking (SDN)? Boyan Sotirov

Module 15: Network Structures

Juniper Networks EX Series/ Cisco Catalyst Interoperability Test Results. May 1, 2009

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Evaluating the Impact of Data Center Network Architectures on Application Performance in Virtualized Environments

Optimizing Data Center Networks for Cloud Computing

Cloud Infrastructure Planning. Chapter Six

2. What is the maximum value of each octet in an IP address? A. 28 B. 255 C. 256 D. None of the above

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

Outline VLAN. Inter-VLAN communication. Layer-3 Switches. Spanning Tree Protocol Recap

SummitStack in the Data Center

Scafida: A Scale-Free Network Inspired Data Center Architecture

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Flexible SDN Transport Networks With Optical Circuit Switching

Objectives. Explain the Role of Redundancy in a Converged Switched Network. Explain the Role of Redundancy in a Converged Switched Network

VXLAN Overlay Networks: Enabling Network Scalability for a Cloud Infrastructure

VLAN 802.1Q. 1. VLAN Overview. 1. VLAN Overview. 2. VLAN Trunk. 3. Why use VLANs? 4. LAN to LAN communication. 5. Management port

Experimental Framework for Mobile Cloud Computing System

Data Center Network Topologies: VL2 (Virtual Layer 2)

Transcription:

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat Presented by Frank Austin Nothaft

Network management at scale O(10-100K) machines per datacenter: O(40) machines per rack O(20) racks per row Network is typically organized in a Fat-tree like topology > deeply hierarchical Machines are virtualized

Network management at scale: What is expensive? Routing tables explode Lots of hardware > config-by-human = bad Updates that need to be broadcast

Network management at scale: What do we want? 1. Easy VM migration 2. No admin involvement 3. Efficient communication between any two nodes 4. No loops 5. Efficient failure recovery

What are the implications of these aims? R1: VM Migration R2: No admin Cannot do at layer 3 > breaks existing TCP connections Need single L2 fabric for whole DC Not compatible with R5 with current rout. protocols R3: Any-to-any Requires huge routing tables R4: No loops R5: Failure Recov. Can occur during routing protocol convergence Can avoid at L2, but either inefficient or incompat. Need to quickly update routing info Difficult with present protocols > require bcast

PortLand s realization:! If topology is known, problems are much simpler!

Datacenter Topology In a Fat Tree, bandwidth increases towards the core Variety of topologies, generally a multi-rooted tree

How can we exploit topology? We can solve some problems by minimizing distributed state, while solving others by centralizing state PortLand does the following two high-level optimizations: 1. Rewrite addresses: If we can rewrite the MAC addresses of leaf nodes, we can greatly simplify routing tables 2. Offload management to centralized Fabric Controller: Assists with address resolution, failover, etc.

Address Rewriting MAC address > 48 bit unique address for each endpoint, used for ethernet routing Issue: if MAC addresses are random, routing table of each switch grows O(n) PortLand uses knowledge of topology to rewrite MAC addresses at edge routers: pod:position:port:vmid

Aside: Ethernet Switching Routing Table Switch Scheduler 1 2 3 4???? 1 2 3 4 Routing table > SRAM based CAM

Location Discovery Authors present distributed algo. for discovering switch location: Each switch sends a message indicating port direction, nodes do not send messages Insight: edge routers only receive messages from aggregation routers, aggregation routers will receive messages from edge routers on downwards facing ports Fabric manager assigns IDs to switches

So, what do we do with this?

Proxy ARP Natively, resolve addresses by broadcasting However, if Fabric Manager knows an IP<->MAC mapping, we can eliminate broadcast traffic

Loop-free Forwarding Fact: Fat trees have many physical loops! Can you find them all?

Provably Loop-free Forwarding If we allow a packet to only go up the tree once, we cannot have a loop As an aside: Fat Tree ~= folded Clos. Haven t we known this since the 50 s?

Failover Fabric Manager maintains state of failed links Link failure is tracked using Location Discovery protocol > missing message = failure

Validation

Experimental Testbed 20x 4-port NetFPGAs arranged in a Fat Tree, 1 Gbps per port 16 compute nodes < datacenter scale? http://netfpga.org/1g_specs.html

Fabric Manager Capacity Authors cite 25 ARPs/sec/host as a high number 30,128 hosts <<< 100k machines, 32 VM/machine? Also, deus ex machina number: 25µs/request?

Conclusions

PortLand: Did it achieve its goals? R1: VM Migration Migrate via ARP + unified L2 R2: No admin R3: Any-to-any Distributed process for learning network topology coupled with central Fabric Manager Rewrite MAC addresses to reduce routing table size, makes problem tractable R4: No loops Provably loop free routing on Fat Trees R5: Failure Recov. Distributed topology learning algorithm rapidly learns failure status

Discussion If you have a practically static topology, you can eliminate most of the complexity in this paper? Is the Fabric Manager actually viable as datacenters continue to increase in size? Wouldn t this be simpler if you pushed MAC address rewriting down to your VMM/hypervisor?