Portland: how to use the topology feature of the datacenter network to scale routing and forwarding



Similar documents
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat

Scale and Efficiency in Data Center Networks

PortLand:! A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Advanced Computer Networks. Datacenter Network Fabric

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Axon: A Flexible Substrate for Source- routed Ethernet. Jeffrey Shafer Brent Stephens Michael Foss Sco6 Rixner Alan L. Cox

Lecture 7: Data Center Networks"

Chapter 6. Paper Study: Data Center Networking

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

BURSTING DATA BETWEEN DATA CENTERS CASE FOR TRANSPORT SDN

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Ethernet (LAN switching)

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

Data Center Network Topologies: FatTree

Load Balancing Mechanisms in Data Center Networks

Electronic Theses and Dissertations UC San Diego

Cisco s Massively Scalable Data Center

Network Technologies for Next-generation Data Centers

Extending Networking to Fit the Cloud

OpenFlow based Load Balancing for Fat-Tree Networks with Multipath Support

VXLAN: Scaling Data Center Capacity. White Paper

Optimizing Data Center Networks for Cloud Computing

TRILL Large Layer 2 Network Solution

How To Understand and Configure Your Network for IntraVUE

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

TRILL for Data Center Networks

Torii-HLMAC: Torii-HLMAC: Fat Tree Data Center Architecture Elisa Rojas University of Alcala (Spain)

Layer 3 Routing User s Manual

VMDC 3.0 Design Overview

Applying NOX to the Datacenter

Chapter 1 Reading Organizer

hp ProLiant network adapter teaming

Course Overview: Learn the essential skills needed to set up, configure, support, and troubleshoot your TCP/IP-based network.

Virtual PortChannels: Building Networks without Spanning Tree Protocol

Adaptive Routing for Layer-2 Load Balancing in Data Center Networks

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

STATE OF THE ART OF DATA CENTRE NETWORK TECHNOLOGIES CASE: COMPARISON BETWEEN ETHERNET FABRIC SOLUTIONS

DATA center infrastructure design has recently been receiving

UNIVERSITY OF CALIFORNIA, SAN DIEGO. Miswirings Diagnosis, Detection and Recovery in Data Centers

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

Building Secure Network Infrastructure For LANs

Analysis of Network Segmentation Techniques in Cloud Data Centers

OpenFlow/SDN for IaaS Providers

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Auxiliary Protocols

Fiber Channel Over Ethernet (FCoE)

Chapter 10 Link-State Routing Protocols

Internetworking and Internet-1. Global Addresses

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Data Center Networking Designing Today s Data Center

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

Cloud Computing and the Internet. Conferenza GARR 2010

20. Switched Local Area Networks

Network performance in virtual infrastructures

- Multiprotocol Label Switching -

Visio Enabled Solution: One-Click Switched Network Vision

Torii: Multipath Distributed Ethernet Fabric Protocol for Data Centers with Zero-Loss Path Repair

AIN: A Blueprint for an All-IP Data Center Network

Scaling 10Gb/s Clustering at Wire-Speed

What is VLAN Routing?

CS 91: Cloud Systems & Datacenter Networks Networks Background

Objectives. The Role of Redundancy in a Switched Network. Layer 2 Loops. Broadcast Storms. More problems with Layer 2 loops

Ethernet Fabric Requirements for FCoE in the Data Center

Data Center Networks

Vocia MS-1 Network Considerations for VoIP. Vocia MS-1 and Network Port Configuration. VoIP Network Switch. Control Network Switch

8.2 The Internet Protocol

Juniper Networks QFabric: Scaling for the Modern Data Center

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Ethernet-based Software Defined Network (SDN)

MASTER THESIS. Performance Comparison Of the state of the art Openflow Controllers. Ahmed Sonba, Hassan Abdalkreim

2. What is the maximum value of each octet in an IP address? A. 28 B. 255 C. 256 D. None of the above

Chapter 3. Enterprise Campus Network Design

Top-Down Network Design

Electronic Theses and Dissertations UC San Diego

Software Defined Networking & Openflow

Cisco Discovery 3: Introducing Routing and Switching in the Enterprise hours teaching time

Data Center Networking

Data Center Content Delivery Network

CS 348: Computer Networks. - IP addressing; 21 st Aug Instructor: Sridhar Iyer IIT Bombay

GUI Tool for Network Designing Using SDN

IP Addressing and Subnetting. 2002, Cisco Systems, Inc. All rights reserved.

Data Center Convergence. Ahmad Zamer, Brocade

DEMYSTIFYING ROUTING SERVICES IN SOFTWAREDEFINED NETWORKING

Cloud Infrastructure Planning. Chapter Six

Hedera: Dynamic Flow Scheduling for Data Center Networks

Internet Working 5 th lecture. Chair of Communication Systems Department of Applied Sciences University of Freiburg 2004

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

ProActive Routing in Scalable Data Centers. with PARIS

Comparisons of SDN OpenFlow Controllers over EstiNet: Ryu vs. NOX

AUTO DEFAULT GATEWAY SETTINGS FOR VIRTUAL MACHINES IN SERVERS USING DEFAULT GATEWAY WEIGHT SETTINGS PROTOCOL (DGW)

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

CSIS CSIS 3230 Spring Networking, its all about the apps! Apps on the Edge. Application Architectures. Pure P2P Architecture

International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: Volume 8 Issue 1 APRIL 2014.

Secure Cloud Computing with a Virtualized Network Infrastructure

co Characterizing and Tracing Packet Floods Using Cisco R

From Active & Programmable Networks to.. OpenFlow & Software Defined Networks. Prof. C. Tschudin, M. Sifalakis, T. Meyer, M. Monti, S.

Operating Systems. Cloud Computing and Data Centers

ProActive Routing in Scalable Data Centers with PARIS

Transcription:

LECTURE 15: DATACENTER NETWORK: TOPOLOGY AND ROUTING Xiaowei Yang 1

OVERVIEW Portland: how to use the topology feature of the datacenter network to scale routing and forwarding ElasticTree: topology control to save energy Briefly 2

BACKGROUND Link layer (layer 2) routing and forwarding Network layer (layer 3) routing and forwarding The FatTree topology 3

LINK LAYER ADDRESSING To send to a host with an IP address p, a sender broadcasts an ARP request within its IP subnet The destination with the IP address p will reply The sender caches the result 4

LINK LAYER FORWARDING Src=x, Dest=y Src=x, Dest=y Port 1 Port 2 x is at Port 3 y is at Port 4 Port 4 Port 5 Src=y, Src=x, Dest=x Dest=y Src=x, Dest=y Src=y, Src=x, Dest=x Dest=y Port 3 Port 6 Src=x, Dest=y Done via learning bridges Bridges run a spanning tree protocol to set up a tree topology First packet from a sender to a destination is broadcasted to all destinations in the IP subnet along the spanning tree Bridges on the path learn the sender s MAC address and incoming port Return packets from a destination to a sender are unicast along the learned path 5

NETWORK LAYER ROUTING AND ADDRESSING Each subnet is assigned an IP prefix Routers run a routing protocol such as OSPF or RIP to establish the mapping between an IP prefix and a next hop router 6

QUESTION Which one is better for a datacenter network that may have hundreds of thousands of hosts? 7

DATACENTER TOPOLOGIES Hierarchical Racks, Rows 8

FAT TREE WITH 4-PORT 4 SWITCHES k-port homogeneous switches, k/2-ary 3-tree, 5/4k 2 switches, k 3 /4 hosts 10

11

PortLand A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya and Amin Vahdat 12

PortLand In A Nutshell PortLand is a single logical layer 2 data center network fabric that scales to millions of endpoints PortLand internally separates host identity from host location Uses IP address as host identifier Introduces Pseudo MAC (PMAC) addresses internally to encode endpoint location PortLand runs on commodity switch hardware with unmodified hosts 13

Data Centers Are Growing In Scale Mega Data Centers Microsoft DC at Chicago 500,000+ servers Large Scale Applications Facebook, Search All to all communication Most Data centers run more than one application Virtualization in Data Center 10 VMs per server 5 million routable addresses! 14

Goals For Data Center Network Fabrics Easy configuration and management Plug and play Fault tolerance, routing, and addressing Scalability Commodity switch hardware Small switch state Virtualization support Seamless VM migration 15

Layer 2 versus Layer 3 Data Center Fabrics Technique Plug and play Scalability Small Switch State Layer 2: + - - Seamless VM Migration + Flat MAC Addresses Layer 3: IP Addresses - + + - 16

Layer 2 versus Layer 3 Data Center Fabrics Technique Plug and play Scalability Small Switch State Layer 2: Flat MAC Addresses + Flooding Host MAC Address 9a:57:10:ab:6e 1 62:25:11:bd:2d 3 ff:30:2a:c9:f3 0. Out Port Seamless VM Migration - - + No change to IP endpoint Layer 3: - + + - Host IP Address IP LSR Addresses Broadcast 10.2.x.x 0 Location-based addresses mandate based manual 10.4.x.x configuration 1 Out Port IP endpoint can change 17

Cost Consideration: Flat Addresses vs. Location Based Addresses Commodity switches today have ~640 KB of low latency, power hungry, expensive on chip memory Stores 32 64 K flow entries Assume 10 million virtual endpoints in 500,000 servers in data center Flat addresses 10 million address mappings ~100 MB on chip memory ~150 times the memory size that can be put on chip today Location based addresses 100 1000 address mappings ~10 KB of memory easily accommodated in switches today 18

PortLand: Plug and Play + Small Switch State + Pair-wise communication PMAC Address Auto Configuration 0:2:x:x:x:x 0 Out Port 10 million virtual endpoints in 500,000 servers in data center 0:4:x:x:x:x 1 100 1000 address mappings 1. PortLand switches learn location in topology using pair-wise 0:6:x:x:x:x 2 ~10 KB of memory easily communication 0:8:x:x:x:x 3 accommodated in switches today 2. They assign topologically meaningful addresses to hosts using their location 19

PortLand: Main Assumption Hierarchical structure of data center networks: They are multi-level, multi-rooted trees Cisco Recommended Configuration Fat Tree 20

PortLand: Scalability Challenges Challenge State Of Art Address Resolution X Broadcast based Routing X Broadcast based Forwarding X Large switch state 21

Data Center Network Core Switches Aggregation Edge Servers 22

23 Imposing Hierarchy On A Multi-Rooted Tree POD 0 POD 1 POD 2 POD 3 Pod Number

24 Imposing Hierarchy On A Multi-Rooted Tree 0 1 0 1 0 0 1 1 Position Number

25 Imposing Hierarchy On A Multi-Rooted Tree 0 1 0 1 0 1 0 1 PMAC: pod.position.port.vmid

26 Imposing Hierarchy On A Multi-Rooted Tree 0 1 0 1 0 1 0 1 PMAC: pod.position.port.vmid

27 Imposing Hierarchy On A Multi-Rooted Tree 0 1 0 1 0 1 0 1 PMAC: pod.position.port.vmid

Imposing Hierarchy On A Multi-Rooted Tree 0 1 0 1 0 1 0 1 00:00:00:02:00:01 00:00:00:03:00:01 00:00:01:02:00:01 00:00:01:03:00:01 00:01:00:02:00:01 00:01:00:03:00:01 00:01:01:02:00:01 00:01:01:03:00:01 00:02:00:02:00:01 00:02:00:03:00:01 00:02:01:02:00:01 00:02:01:03:00:01 00:03:00:02:00:01 00:03:00:03:00:01 00:03:01:02:00:01 00:03:01:03:00:01 28

PROXY-BASED ARP When an edge switch sees a new AMAC, it assigns a PMAC to the host It then communicates the PMAC to IP mapping to the fabric manager. The fabric manager servers as a proxy-arp agent, and answers ARP queries

PortLand: Location Discovery Protocol Location Discovery Messages (LDMs) exchanged between neighboring switches Switches self-discover location on boot up Location characteristic Technique 1) Tree level / Role Based on neighbor identity 2) Pod number Aggregation and edge switches agree on pod number 3) Position number Aggregation switches help edge switches choose unique position number 30

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level A0:B1:FD:56:32:01??????

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level A0:B1:FD:56:32:01??????

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level A0:B1:FD:56:32:01??????0

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level B0:A1:FD:57:32:01??????

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level B0:A1:FD:57:32:01?????? 1

PortLand: Location Discovery Protocol Switch Identifier Pod Number Position Tree Level B0:A1:FD:57:32:01?????? 1

PortLand: Location Discovery Protocol Propose 1 Switch Identifier Pod Number Position Tree Level A0:B1:FD:56:32:01???? 0

PortLand: Location Discovery Protocol Propose 0 Switch Identifier Pod Number Position Tree Level D0:B1:AD:56:32:01???? 0

PortLand: Location Discovery Protocol Yes Switch Identifier Pod Number Position Tree Level A0:B1:FD:56:32:01?? 1?? 0

PortLand: Location Discovery Protocol Fabric Manager Switch Identifier Pod Number Position Tree Level D0:B1:AD:56:32:01?? 0 0

PortLand: Location Discovery Protocol Fabric Manager 0 Switch Identifier Pod Number Position Tree Level D0:B1:AD:56:32:01?? 0 0

PortLand: Location Discovery Protocol Fabric Manager Pod 0 Switch Identifier Pod Number Position Tree Level D0:B1:AD:56:32:01 0?? 0 0

PortLand: Name Resolution Intercept all ARP packets 43

PortLand: Name Resolution Intercept all ARP packets Assign new end hosts with PMACs 44

PortLand: Name Resolution Intercept all ARP packets Assign new end hosts with PMACs Rewrite MAC for packets entering and exiting network 45

PortLand: Name Resolution Fabric Manager

PortLand: Fabric Manager Fabric Manager ARP mappings Network map Soft state Administrator configuration

PortLand: Name Resolution Fabric Manager 10.5.1.2 MAC??

PortLand: Name Resolution Fabric Manager 10.5.1.2 00:00:01:02:00:01

PortLand: Name Resolution ARP replies contain only PMAC Address HWtype HWAddress Flags Mask Iface 10.5.1.2 ether 00:00:01:02:00:01 C eth1 50

PROVABLY LOOP-FREE FORWARDING Switches populate their forwarding tables after establishing local positions Core switches forward according to pod numbers Aggregation switches forward packets destined to the same pod to edge switches, to other pods to core switches Edge switches forward packets to the corresponding hosts

FAULT TOLERANT ROUTING LDP exchanges serve as keepalive A switch reports a dead link to the fabric manager (FM) The FM updates its faulty link matrix, and informs affected switches the failure Affected switches reconfigure their forwarding tables to bypass the failed link No broadcasting of the failure

Portland Prototype 20 OpenFlow NetFPGA switches TCAM + SRAM for flow entries Software MAC rewriting 3 tiered fat-tree 16 end hosts 53

PortLand: Evaluation Measurements Configuration Results Network convergence time Keepalive frequency = 10 ms Fault detection time = 50 ms 65 ms TCP convergence time RTO min = 200ms ~200 ms Multicast convergence time TCP convergence with VM migration Control traffic to fabric manager CPU requirements of fabric manager RTO min = 200ms 27,000+ hosts, 100 ARPs / sec per host 27,000+ hosts, 100 ARPs / sec per host 110ms ~200 ms 600 ms 400 Mbps non trivial 70 CPU cores non trivial 54

Summarizing PortLand PortLand is a single logical layer 2 data center network fabric that scales to millions of endpoints Modify network fabric to Work with arbitrary operating systems and virtual machine monitors Maintain the boundary between network and end-host administration Scale Layer 2 via network modifications Unmodified switch hardware and end hosts 55

DISCUSSION Unmodified hosts: why is it desirable? Does location-based addressing necessarily mandate manual configuration? Their own solution implies a big NO 56

57

NETWORK CONSUMES MUCH POWER

GOAL: ENERGY PROPORTION NETWORKING

APPROACH: TURN OFF UNNEEDED LINKS AND SWITCHES CAREFULLY AND AT SCALE

ELASTIC TREE ARCHITECTURE

THREE OPTIMIZERS

FORMAL MODEL: MCF Does not scale

GREEDY BIN PACKING For each flow, evaluates all possible flows, and chooses the left-most one with sufficient capacity

TOPOLOGY-AWARE HEURISTICS Active switches == total bandwidth demand / capacity per switch Determine which switches are active, and pack flows to the active switches Add more switches for fault tolerance and connectivity

SCALABILITY

POTENTIAL POWER SAVINGS Near traffic: within the same edge switches Far: remote traffic

REALISTIC DATA CENTER TRAFFIC Savings range from 25-62% A single E-commerce application

SUMMARY An interesting idea: energy-proportional networking Realized it on realistic datacenter topologies Three energy optimizers Heuristics work well

DISCUSSION Evaluation does not use traffic from multiple applications Not sure what the savings are on EC2, AppEngine, or Azure