Peregrine: An All-Layer-2 Container Computer Network



Similar documents
Network Design Issues for Cloud Data Centers

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

Ethernet-based Software Defined Network (SDN)

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Lecture 02b Cloud Computing II

A Case for Overlays in DCN Virtualization Katherine Barabash, Rami Cohen, David Hadas, Vinit Jain, Renato Recio and Benny Rochwerger IBM

Network Technologies for Next-generation Data Centers

Web Application Hosting Cloud Architecture

20. Switched Local Area Networks

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Network Virtualization for Large-Scale Data Centers

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

Introduction to Cloud Design Four Design Principals For IaaS

VXLAN: Scaling Data Center Capacity. White Paper

Analysis of Network Segmentation Techniques in Cloud Data Centers

An Overview of OpenFlow

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Virtual Machine in Data Center Switches Huawei Virtual System

CloudLink - The On-Ramp to the Cloud Security, Management and Performance Optimization for Multi-Tenant Private and Public Clouds

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Distributed Block-level Storage Management for OpenStack

Cisco Application Networking for IBM WebSphere

Ten Things to Look for in an SDN Controller

Data Center Networking Designing Today s Data Center

SDN and Data Center Networks

Brocade One Data Center Cloud-Optimized Networks

TRILL for Data Center Networks

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Panel: Cloud/SDN/NFV 黃 仁 竑 教 授 國 立 中 正 大 學 資 工 系 2015/12/26

Software-Defined Networking for the Data Center. Dr. Peer Hasselmeyer NEC Laboratories Europe

DATA CENTER. Best Practices for High Availability Deployment for the Brocade ADX Switch

Cloud Networking Disruption with Software Defined Network Virtualization. Ali Khayam

EVOLVING ENTERPRISE NETWORKS WITH SPB-M APPLICATION NOTE

TRILL Large Layer 2 Network Solution

VMware vcloud Networking and Security Overview

50. DFN Betriebstagung

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Strategies for Getting Started with IPv6

Lecture 7: Data Center Networks"

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Transport and Network Layer

VLAN und MPLS, Firewall und NAT,

Virtual PortChannels: Building Networks without Spanning Tree Protocol

Hyper-V Network Virtualization Gateways - Fundamental Building Blocks of the Private Cloud

Advanced Computer Networks. Datacenter Network Fabric

Data Centers and Cloud Computing

Expert Reference Series of White Papers. vcloud Director 5.1 Networking Concepts

How To Understand The Power Of The Internet

Testing Software Defined Network (SDN) For Data Center and Cloud VERYX TECHNOLOGIES

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

Outline VLAN. Inter-VLAN communication. Layer-3 Switches. Spanning Tree Protocol Recap

Extending Networking to Fit the Cloud

Software Defined Network (SDN)

Datacenter architectures

Network Virtualization and Application Delivery Using Software Defined Networking

Secure Cloud Computing with a Virtualized Network Infrastructure

Simplifying Virtual Infrastructures: Ethernet Fabrics & IP Storage

Avaya VENA Fabric Connect

Troubleshooting BlackBerry Enterprise Service 10 version Instructor Manual

ConnectX -3 Pro: Solving the NVGRE Performance Challenge

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

The Road to Cloud Computing How to Evolve Your Data Center LAN to Support Virtualization and Cloud

Affording the Upgrade to Higher Speed & Density

Software-Defined Networks Powered by VellOS

Spotlight On Backbone Technologies

hp ProLiant network adapter teaming

Optimizing Data Center Networks for Cloud Computing

APV9650. Application Delivery Controller

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Fundamentals of Windows Server 2008 Network and Applications Infrastructure

Availability Digest. Redundant Load Balancing for High Availability July 2013

AppDirector Load balancing IBM Websphere and AppXcel

Scalable Linux Clusters with LVS

Virtualizing the SAN with Software Defined Storage Networks

Why Software Defined Networking (SDN)? Boyan Sotirov

Software Defined Environments

SDN Architecture and Service Trend

Link Layer. 5.6 Hubs and switches 5.7 PPP 5.8 Link Virtualization: ATM and MPLS

INTRODUCTION TO FIREWALL SECURITY

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

High Availability Failover Optimization Tuning HA Timers PAN-OS 6.0.0

Cisco Wide Area Application Services (WAAS) Software Version 4.0

What is SDN? And Why Should I Care? Jim Metzler Vice President Ashton Metzler & Associates

Cloud-Scale Data Center Network Architecture. Cheng-Chun Tu Advisor: Tzi-cker Chiueh

Software Defined Networking (SDN) - Open Flow

Oracle SDN Performance Acceleration with Software-Defined Networking

Expert Reference Series of White Papers. VMware vsphere Distributed Switches

Ch. 13 Cloud Services. Magda El Zarki Dept. of CS UC, Irvine

Cloud Customer Architecture for Web Application Hosting, Version 2.0

SAN Conceptual and Design Basics

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

Radware s AppDirector and AppXcel An Application Delivery solution for applications developed over BEA s Weblogic

Introduction to Network Virtualization in IaaS Cloud. Akane Matsuo, Midokura Japan K.K. LinuxCon Japan 2013 May 31 st, 2013

Improving Network Efficiency for SMB Through Intelligent Load Balancing

Cisco Application Networking for BEA WebLogic

Chapter 1 Reading Organizer

Transcription:

Peregrine: An All-Layer-2 Container Computer Network Tzi-cker Chiueh Cloud Computing Research Center for Mobile Applications (CCMA) 雲 端 運 算 行 動 應 用 研 究 中 心 ICPADS 2011 1 1 Copyright 2008 ITRI 工 業 技 術 研 究 院

Cloud Service Models Infrastructure as a Service (IaaS) A set of virtual machines with storage space and external network bandwidth unfurnished apartment Example: Amazon Web Service Platform as a Service (PaaS) An operating environment including (application-specific) libraries and supporting services (DBMS, AAA) furnished apartment Example: Google s App Engine, Microsoft s Azure, IBM s XaaS Software as a Service (SaaS) Turn-key software hosted on the cloud and accessible through the browser hotel Example: salesforce.com, and all major desktop software vendors 2011/12/14 2

Data Center as a Computer Containerization Optimal HW building block granularity or packaging More efficient power distribution and thermal design Unification of computing, memory, network and storage resources Virtualization of all HW resources: Software-definable boundaries Faster deployment: no on-premise installation needed Requires light-out operation Google-style data center Army of commodity HW Treat failure as a common case ICPADS 2011 3

ITRI s Research Projects Container Computer 1.0 Manageable container computer for AWS-like IaaS Differences between a set of servers/switches/storage boxes and a container computer? Scalable storage/network architecture Comprehensive monitoring and control Energy-efficient cooling Cloud Operating System 1.0 Integrated data center software stack for supporting a AWS-like IaaS service on a set of commodity HW Tight integration of storage, resource, security and system/network management ICPADS 2011 4

Cloud OS 1.0 Service Model Virtual data center consists of one or multiple virtual clusters, each of which comprises one or multiple VMs Tiered architecture-based web services Users provide a Virtual Cluster specification No. of VM instances each with CPU performance and memory size requirement Per-VM storage space requirement External network bandwidth requirement Security policy Backup policy Load balancing policy Network configuration, e.g. public IP address and private IP address range OS image and application image ICPADS 2011 5

Container Computer 1.0 Architecture Physical Server VM0 VM1 VMn Layer-3 Border Routers Layer-2-Only Data Center Network Load Balancing Traffic Shaping Intrusion Detection NAT/VPN Compute Server Rack Storage Server ICPADS 2011 6

Big Cloud-Scale Data Center 5000 servers and up HW failures are inevitable Shared Virtualization: multiple virtual data centers running on a single physical data center State isolation Name space reuse Visibility control Performance isolation Service level agreement (SLA) or Quality of service (QoS) 2011/12/14 ICPADS 2011 7

Design Issues of Cloud Data Center Network Scalable and available data center fabrics Internet Edge Logic: Server load balancing Multi-homing load balancing Traffic shaping or Internet QoS guarantee WAN traffic compression and caching Network support for hybrid cloud Rack area networking for I/O device consolidation and sharing ICPADS 2011 8

What s Wrong with Ethernet? Spanning tree-based Not all physical links are used No load-sensitive dynamic routing Fail-over latency is high ( > 5 seconds) Cannot scale to a large number of VMs (e.g. 1M) Forwarding table is too small: 16K to 64K Does not support VM migration and visibility Does not support private IP address space reuse Limits scope of VM migration ICPADS 2011 9

Peregrine A unified network for LAN and SAN Layer-2 only Centralized control plane and distributed data plane Asymmetric routing Commodity Ethernet switches only Army of commodity switches vs. few high-port-density switches OpenFlow is good to have but not needed No broadcast, flooding, source learning, access control list, etc. Careful architecting of all Internet Edge Logic

Peregrine s Clos Network Topology CORE Region TOR ICPADS 2011 11

Scaling up to 1M VMs Problem: small forwarding table (< 64K) Solution: Two-stage forwarding Source Intermediate Destination Problem: two-stage forwarding limits scalability and introduces latency penalty Solution: Dual-mode forwarding Direct: source destination for heavy-traffic pairs Indirect: source intermediate destination for the rest ICPADS 2011 12

Two-Stage Forwarding Every Intermediate knows how to route to every VM in its scope Intermediate needs to be notified when VM leaves or joins its scope Source Intermediate Destination Intermediate: TOR_Swicth(Dest) or Physical_Machine (Dest) Directory Server: Host Intermediate(Host) map Host 1 Host 5 Directory Server Host 36 ICPADS 2011 13

Fast Fail-Over Goal: Fail-over latency < 100 msec Strategy: Pre-compute a primary and backup route for each VM Each VM has two virtual MACs When a link fails, identify all affected routes Notify hosts using affected primary routes that they should switch to corresponding backup routes Re-compute new alternative routes ICPADS 2011 14

Interaction with Fail-Over Mechanism For each physical node P, routing algorithm computes two disjoint spanning trees, which enable other physical nodes to reach P Direct routing: MAC1(VM25), MAC2(VM25) Indirect routing: MAC1(TOR1):MAC1(VM25); MAC1(TOR2):MAC2(VM25) MAC1(PM22):MAC1(VM25); MAC2(PM22):MAC2(VM25) VM1, VM4, VM127 PM1 TOR1 TOR2 VM8, VM90, VM25 PM22 ICPADS 2011 15

Software Architecture ICPADS 2011 16

Directory Server (DS) Perform a generalized ARP query service A generalized ARP (GARP) map between a VM s IP address and Primary MAC address and an availability bit Backup MAC address and an availability bit Primary intermediary s MAC address and an availability bit Backup intermediary s MAC address and an availability bit Each GARP map entry keeps a list of caching clients and their expiration time Directory clients cache GARP entries using a lease-based cache consistency protocol ICPADS 2011 17

Route Algorithm Server (RAS) Collects VM migration, network element failure and congestion events Collects traffic matrix information at run time Includes a route engine that computes statically or incrementally routes for VM pairs Builds an inverse map that associates network links with all computed routes that travel on them ICPADS 2011 18

Mac-In-Mac (MIM) Kernel Agent Packet encapsulation and de-capsulation Caching GARP responses Intercepts ARP queries from VMs and services them by using GARP cache or contacting DS Check consistency between destination IP address and destination MAC address for each packet Collecting traffic matrix ICPADS 2011 19

Encapsulation and Decapsulation Encapsulation: Place DA and SA in the Ethernet source address field Ethernet Header Destination address IDA Source address DA SA Decapsulation: extract DA and put it in the Ethernet destination address field Ethernet Header Destination address IDA Source address DA SA Ethernet Header DA SA Each VM s and PM s MAC address is effectively only 3 bytes long (as opposed to 6 bytes long) The most significant half of the Ethernet source address field signifies whether it is encapsulated SA: Source address, DA: Destination address IDA: Intermediate Destination address ICPADS 2011 20

Private IP Address Space Reuse Private IP Address Space Reuse (PASR) Private IP address space is partitioned into a service node range and a non-service node range The service node range is shared by all VDCs The non-service node range could be resued across VDCs Key used for GARP query: VDC ID + IP Address The same IP address may be resolved into a different MAC address depending on its VDC ID Enforces inter-vdc isolation by checking outgoing packet s destination MAC matches its destination IP address VDC: process IP address MAC address: Virtual address Physical Address

When a Network Link Fails ICPADS 2011 22

Fail-Over Practice Witness switches RAS DS Affected VMs Witness switches RAS: send multiple SNMP traps to multiple SBMP managers RAS DS: RAS and DS are connected by dedicated links DS Affected VMs: use backup routes if necessary HA mechanism for RAS and DS Invalidate retired forwarding table entries Compute new alternative routes ICPADS 2011 23

When a VM X Moves Notifies RAS, which in turn notifies DS DS modifies X s GARP entry to reflect the MAC addresses of its new primary and backup intermediary DS invalidates X s GARP entry cached on all other VMs that communicate with it DS invalidates (asynchronously) all direct forwarding entries of this VM on the network ICPADS 2011 24

Centralized Load-Balancing Routing Ideal load-balancing routing algorithm Compute the M shortest routes for every <u,v> Distribute the traffic load of <u,v> among these M routes Calculate the expected load of every physical network link Weight of a link: EL(link)/RC(link) Approximate load-balancing routing algorithm 1. Sort traffic matrix entries in decreasing order 2. For each <u,v>, use weighted shortest path algorithm to find its primary and backup route Only top N heavy-traffic node pairs accounting for Z% of all traffic 3. Update the weights of links associated with <u,v> and its chosen path; go to 1 ICPADS 2011 25

Internet Edge Logic Cluster-based implementation Server load balancing Firewalling and IDS/IPS Network Address Translation Multi-homing load balancing (Cloud OS 2.0) Internet traffic shaping (Cloud OS 2.0) VPN for hybrid cloud (Cloud OS 2.0) WAN traffic caching and compression (Cloud OS 2.0) ICPADS 2011 26

Server Load Balancer in Cloud OS 1.0 Goal: distribution of input requests among virtual machines in an Internet-facing virtual cluster Layer 4 request distribution for services identified by TCP and UDP port numbers based on network and CPU load Cluster implementation with HA, load balancing and autoscheduling Sticky session: source IP based Direct server return Support for auto-scheduling of virtual cluster ICPADS 2011 27

Direct Return of Response Traffic Intra-VC inter-vm load balancing Direct Server Return (DSR) reduces traffic by allowing web servers to send HTTP responses directly back to the requesting client Server Load Balancer Clients 1. Packet is sent to SLB 2.SLB forwards the packet to a real server Internet Gateway Router and Firewall Local Local Network Network Internal Network 3. Real server responds directly to client ( Direct Server Return ) Web server ICPADS 2011 28

Software Architecture SLB Server Node User space SLB daemon VC/VM logic VC/VM metadata Virtual Machine Manager (VMM) Repository Server (RS) Kernel space Setup LB scheduling rules VM loading compute node User space Performance data Add/Remove VIP IPVS module Setup proxy Dom0 daemon RTNETLINK Packet s VDC ID Performance data MIM Kernel Agent libvirt WAF(M) Kernel Xen hyperviser ICPADS 2011 VM VM VM 29

SLB Fault Tolerance Active-active HA rather than active-passive HA Ring structure SLB with each one monitored by its neighbor Master SLB synchronizes VC information to other SLB servers SLB cluster SLB cluster Master SLB Virtual Machine Manager Mastercheck alive SLB a SLB b alive Heartbeat SLB SLB SLB d SLB c SLB c ICPADS 2011 30

SLB Load Balancing Master SLB server periodically monitors each SLB server s load and balances the loads among SLB servers, by re-distributing VC request dispatching responsibilities among SLB servers Router Router SLB Cluster Load Unbalance SLB Cluster Load Balanced SLB SLB migrate VC2 VIP1 VIP2 VIP1 VIP2 VM VM VM VM VM VM VM VM Virtual Cluster 1 Virtual Cluster 2 ICPADS 2011 Virtual Cluster 1 Virtual Cluster 2 31

SLB Auto Scaling Add new SLB node: When the average load of a SLB cluster is above a high watermark, the master SLB server asks PRM to power on one more SLB server Remove a SLB node: When the average load of a SLB cluster is under a low watermark, the master SLB server asks PRM to power off one SLB server to save power SLB cluster ICPADS 2011 32

On-Going Work Performance Isolation between storage access traffic and application traffic QoS-aware routing Scalability concerns: Coverage of intermediate proxy GARP query processing GARP cache invalidations upon network failure or VM migration Layer-2 broadcast/multicast support Incremental routing ICPADS 2011 33

Conclusions Cloud data center network issues All-L2 data center backbone (e.g. TRILL and SPB) Internet edge logic Rack area networking Hybrid network support Existing solutions are fragmented or incomplete Plenty of room for innovation for a fully integrated data center network solution Current Status of Peregrine: 100-node testing is done, is going to move it to a 500-node container computer ICPADS 2011 34

Thank You! Questions and Comments? tcc@itri.org.tw ICPADS 2011 35