The Network Hypervisor



Similar documents
A Case for Overlays in DCN Virtualization Katherine Barabash, Rami Cohen, David Hadas, Vinit Jain, Renato Recio and Benny Rochwerger IBM

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

VXLAN: Scaling Data Center Capacity. White Paper

Extending Networking to Fit the Cloud

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

VXLAN Overlay Networks: Enabling Network Scalability for a Cloud Infrastructure

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

What is VLAN Routing?

SSVP SIP School VoIP Professional Certification

Cloud Computing and the Internet. Conferenza GARR 2010

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

An Introduction to Open vswitch

NVO3: Network Virtualization Problem Statement. Thomas Narten IETF 83 Paris March, 2012

RCL: Software Prototype

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

Network Virtualization Technologies and their Effect on Performance

Data Center Network Virtualisation Standards. Matthew Bocci, Director of Technology & Standards, IP Division IETF NVO3 Co-chair

Expert Reference Series of White Papers. vcloud Director 5.1 Networking Concepts

Network Virtualization

WHITE PAPER. Network Virtualization: A Data Plane Perspective

VMware ESX Server Q VLAN Solutions W H I T E P A P E R

2. What is the maximum value of each octet in an IP address? A. 28 B. 255 C. 256 D. None of the above

LAN Switching and VLANs

Overview of Routing between Virtual LANs

Analysis of Network Segmentation Techniques in Cloud Data Centers

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

CLOUD NETWORKING FOR ENTERPRISE CAMPUS APPLICATION NOTE

Wave Relay System and General Project Details

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

VXLAN Performance Evaluation on VMware vsphere 5.1

Network Technologies for Next-generation Data Centers

Using LISP for Secure Hybrid Cloud Extension

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Network Virtualization for Large-Scale Data Centers

Cloud Networking Disruption with Software Defined Network Virtualization. Ali Khayam

Network Performance Comparison of Multiple Virtual Machines

RCL: Design and Open Specification

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

Simplify IT. With Cisco Application Centric Infrastructure. Barry Huang Nov 13, 2014

AUTO DEFAULT GATEWAY SETTINGS FOR VIRTUAL MACHINES IN SERVERS USING DEFAULT GATEWAY WEIGHT SETTINGS PROTOCOL (DGW)

Network Virtualization and its Application to M2M Business

vsphere Networking ESXi 5.0 vcenter Server 5.0 EN

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

Enterasys Data Center Fabric

Software Defined Network (SDN)

DCB for Network Virtualization Overlays. Rakesh Sharma, IBM Austin IEEE 802 Plenary, Nov 2013, Dallas, TX

Internet Working 5 th lecture. Chair of Communication Systems Department of Applied Sciences University of Freiburg 2004

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Using Network Virtualization to Scale Data Centers

SSVVP SIP School VVoIP Professional Certification

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

Basic Networking Concepts. 1. Introduction 2. Protocols 3. Protocol Layers 4. Network Interconnection/Internet

Programmable Networking with Open vswitch

Data Networking and Architecture. Delegates should have some basic knowledge of Internet Protocol and Data Networking principles.

Brocade One Data Center Cloud-Optimized Networks

Preserve IP Addresses During Data Center Migration

Data Center Networking Designing Today s Data Center

Network performance in virtual infrastructures

Cisco Dynamic Workload Scaling Solution

Data Center Convergence. Ahmad Zamer, Brocade

D1.2 Network Load Balancing

Nutanix Tech Note. VMware vsphere Networking on Nutanix

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

VMWARE WHITE PAPER 1

enetworks TM IP Quality of Service B.1 Overview of IP Prioritization

Quantum Hyper- V plugin

MPLS Layer 2 VPNs Functional and Performance Testing Sample Test Plans

Performance of Software Switching

Outline VLAN. Inter-VLAN communication. Layer-3 Switches. Spanning Tree Protocol Recap

Virtual Networking with z/vm Guest LAN and Virtual Switch

Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters

vsphere Networking vsphere 6.0 ESXi 6.0 vcenter Server 6.0 EN

Layer 3 Network + Dedicated Internet Connectivity

CompTIA Exam N CompTIA Network+ certification Version: 5.1 [ Total Questions: 1146 ]

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Networking in the Era of Virtualization

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Internetworking and IP Address

ConnectX -3 Pro: Solving the NVGRE Performance Challenge

Networking 4 Voice and Video over IP (VVoIP)

How To Manage A Virtualization Server

The Keys for Campus Networking: Integration, Integration, and Integration

Based on Computer Networking, 4 th Edition by Kurose and Ross

Advanced Computer Networks. Network I/O Virtualization

Remote PC Guide Series - Volume 1

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

STATE OF THE ART OF DATA CENTRE NETWORK TECHNOLOGIES CASE: COMPARISON BETWEEN ETHERNET FABRIC SOLUTIONS

NVGRE Overlay Networks: Enabling Network Scalability for a Cloud Infrastructure

Lecture 02b Cloud Computing II

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Security Overview of the Integrity Virtual Machines Architecture

Chapter 1 Reading Organizer

Network Layer IPv4. Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS. School of Computing, UNF

Virtualization, SDN and NFV

Analysis on Virtualization Technologies in Cloud

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

NComputing L-Series LAN Deployment

Windows Server 2008 R2 Hyper-V Live Migration

Evolution of Software Defined Networking within Cisco s VMDC

Transcription:

IBM Research Abstraction The Hypervisor David Hadas, Haifa Research Lab, Nov, 2010 Davidh@il.ibm.com 1 IBM 2010

Agenda New Requirements from DCNs ization Clouds Our roach: Building Abstracted s lication s 1,2 (VANs) An Abstracted solution Developed by HRL as part of Reservoir Abstracted Challenges (1) D. Hadas, S. Guenender, and B. Rochwerger, network services for federated cloud computing, tech. rep., IBM, 2009. Research Report H-0269. (2) A. Landau, D. Hadas, and M. Ben-Yehuda, Plugging the hypervisor abstraction leaks caused by virtual networking, in SYSTOR 10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pp. 1 9, ACM, 2010. 2 IBM 2010

Traditional View View DCs scale to 100,000 End Stations [1,2] Routers connect L2 domain, sub-divided to VLANs Typically a LAN is up to 250 End Stations [1,2,4] Typically a L2 domain is up to 4K End Stations [1,2,3] Scale: 1000 LANs, 100 L2 Domain Server View End-Stations have single IP and MAC and are dumb (know only a default GW) topology and routes are under the control of the network DC Router Internet [1] commoditization, [2] [3] [4] Greenberg, C. P. Albert Garimella, Kim Greenberg and J. A., Y.-W. Rexford. Proceedings et al.,, ParantapLahiri, E. The Sung, Revisiting Cost of N. the Zhang, of ethernet: ACM a David Cloud: workshop and A. Plug-and-play Research S. Maltz, Rao. on Characterizing ParveenPatel Programmablerouters Problems made in scalable Data vlanusage, SudiptaSengupta, Center and for efficient. in s, extensible an operational In Towards Proc. services CCR, IEEE network. v39, a of next LANMAN tomorrow, n1, generation Jan 09. In INM Workshop,2007. August 07data 22-22, center 2008, architecture: Seattle, scalability WA, USAand Layer-2 domain Layer-2 domain 3 IBM 2010 The Compute IP Cloud

The Future of DCNs Hard to see the future is # of Physical s per DC is already over 100,000 s # of s per will grow to 64 192 (32 core machine) 1 or 2 virtual network interfaces per We believe the real number is 4, 8 or more since the cost of virtual interfaces is lower than that of physical interfaces, leading to new use patterns Total: 10,000,000 100,000,000 virtual network interfaces in a large DC! 4 IBM 2010

ization The Naive Way New View The vswitch is part of the Physical! 64-192 s per Server (32 Core Machine) Multiple virtual interfaces per Ratio: 1:1 LANs to s, 10:1 L2 Domains to s Scale: 100,000 LANs, 10,000 L2 Domain DC Router Layer-2 domain Internet Layer-2 domain New Server View s have single IP and MAC and are dumb (knows only a default GW) topology and routes are under the control of the network (now spanning into s) 5 IBM 2010 The New (The Naive Way) Compute IP Cloud

Wouldn t L2 Domains of The Future be Larger? Vendors have for many years been seeking ways to grow L2 further with very partial success L2 Domain scaling is on the most wanted list for many years Ethernet related research (See for example [1,2] ) concludes that scaling Ethernet requires fundamental changes into the Ethernet protocol (e.g. to eliminate broadcast services such as ARP) Standardization work has started on IETF and IEEE to address Ethernet related issues (TRILL, Layer 2 Multipathing) Will L2 scale by a factor of 10, 100, 1,000, 10,000? Will L2 fit more complex environments with multiple sites? [1] Proceedings [2] Proceedings A. ChanghoonKim Myers, of T.S.E. of the ACM ACM, Ng, Matthew and SIGCOMM H. Caesar Zhang, HotNets, 2008,"Rethinking Jennifer conference 2004. Rexford, the on Service Data Floodless communication, Model: in seattle: Scaling a Ethernet August scalable 17-22, to ethernetarchitecture a Million 2008, Nodes", Seattle, WA, for USA large enterprises, 6 IBM 2010

Infrastructure for Clouds Clouds need to scale # of s, # of Vitual networks, # of Sites Automated Mobility Anywhere Clouds need to be agile Fully automated Enable auto-placement decisions Dynamic Self-managed Simple to use Efficient Clouds need to support establishing isolated virtual networks Private to the hosted customer and managed by it (allowing customers to determine their own address space) Extend beyond the limits of a single data center (physical and administrative) without requiring coordinated management and while maintaining data center independence and security. Established ad-hoc without requiring pre-configuration and/or management control of the physical network 7 IBM 2010

Mobility Anywhere Automation Mobility Anywhere Local Mobility How to achieve this? Mobility Anywhere Today s Solutions Automation No Mobility Server Consolidation Site B Site 8 Site A IBM 2010

Mobility Anywhere Challenges In Data Centers Data Center ization Requires Remodeling Of The IDC, IDC, Oct Oct 2009: 2009: The The DC DC network network industry industry is is witnessing witnessing a a dramatic dramatic shift shift in in DC DC architecture. architecture. It It is is clear clear that that the the current current network network topology topology will will not not meet meet the the requirements requirements of of the the new new DC. DC. Server, Server, storage, storage, and and network network virtualization virtualization will will be be the the overriding overriding premise premise of of the the DC DC buildout. buildout. A solution that enables customers to freely mobile s across Data Centers is needed 1 A Long Distance LAN technology Current approach: Extending local VLANs across subnets 2 A Revised Client/Server Routing technology Connecting Internet Clients to mobile Servers 2 (This) is not possible with existing technologies. (It) will require the redesign of the IP network between the data centers involving the Internet. [1] [1] 1 9 [1] http://www.cisco.com/en/us/solutions/collateral/ns340/ns517/ns224/ns836/white_paper_c11-557822.pdf IBM 2010

Abstracted s The Hypervisor virtualization should follow (well established) host virtualization principles: virtualization should enable virtual machines To remain independent of physical location To remain independent of the host physical characteristics such as CPU, Memory, I/O, etc. To form isolated compute environments on top of the shared physical host environment virtualization should enable virtual machines To remain independent of physical location To remain independent of the network physical characteristics such as topology, protocols, addresses, etc. To form isolated network environments on top of the shared physical network environment serving the hosts Such complete network virtualization can be achieved using network abstraction hence the term Abstracted s 10 IBM 2010

Physical Abstraction 11 IBM 2010

A New Way To Look At ing, Hypervisor Evolution In the 1970 s, IBM shipped the first hypervisors offering a complete virtual replica of the hosting system virtual machines are offered a fully protected and isolated copy of the underlying physical host hardware replica can be hardware dependent (No abstraction)! Later, the hypervisor role evolved in order to support advanced administrative functions, such as Checkpoint/Restart Cloning Live-Migration (Mobility) The hypervisor needs to ensure that the guest application and operating system are able to continue running unchanged even when the compute, network and storage environments drastically changes. Can be achieved by completely abstracting the environment offered to guests and ensuring that guests become unaware of any characteristic of the hosting platform. HOST Machine Memory CPU I/O Storage 12 IBM 2010

A New Way To Look At ing Current Abstraction Layers are Leaking Most current hypervisors offer incomplete network abstraction. Common approaches include: directly accessed a physical network adapter The hypervisor exposes direct references to unique platform hardware resources An emulated NIC or backend is connected to a local host software bridge/router The hypervisor exposes direct references to network addresses with spatial meaning (may also be time-bounded). Use Address Translation Received packets continue to include references of physical addresses, causing a leak in the hypervisor abstraction layer. Current hypervisors tend to consider this incomplete abstraction as a network problem rather then a problem of the hypervisor. Solution: Restrict the mobility to the network segment Solution: Synchronize the mobility with a timely reorganization of the network. Abstraction Layer Environment Physical Information Machine HOST Physical Information Leaks via The Abstraction Layer Memory CPU I/O Storage 13 IBM 2010

Plugging The Hypervisor Abstraction Layer Leaks Abstraction: s should not talk to the hardware (physical network) Use separate addresses Hypervisor should provide mapping Isolation: s belonging to one virtual network should be isolated from s not connected to that network Security by Isolation Isolated Performance Independent Address/Naming schemes Abstraction X Abstraction X Isolation X Switch, Router Site Physical s Site Physical Devices 14 IBM 2010 X X X Abstraction Isolation Abstraction

ization Abstracted Physical View End Stations (Servers) have single IP and MAC in the physical network domain are dumb (knows only a default GW) Physical network design is unchanged Same scale as today Abstracted s View s hypervisors form an abstracted network between them Abstracted network topology and routes are under the control of the hosts Internet IP Cloud DC Router The Compute Pseudo-wires Layer-2 domain Layer-2 domain 15 IBM 2010

lication s A VAN is a virtual and distributed switching service connecting s. VANs revolutionize the way networks are organized by using an overlay network between hypervisors of different hosting platforms. The hypervisors overlay network decouple the virtual network services offered to s from the underlying physical network. As a result, the network created between s becomes independent from the topology of the physical network. Prior Work: VIOLIN J. Xuxian, and X. Dongyan (2004); R. Paul, J. Xuxian, and X. Dongyan (2005) VNET: A. I. Sundararaj, and P. A. Dinda (2004); A. I. Sundararaj, A. Gupta, and P. A. Dinda (2004) 16 IBM 2010

Example for Abstracted s The HRL lication s (VAN) project Reservoir Mid 2008-2010 An EU project for federated clouds Proof of Concept developed under system X Demonstrated to the EU at EOY-1 and EOY-2 Planned demonstration at EOY-3 will includes federated migration What are VANs? A design following the Abstracted principles A L2-alike service for s using IP-based Pseudo-Wires A set of control protocols for setting and maintaining Pseudo- Wires Auto Discovery Dynamic Routing A set of methodologies for using Abstracted s Overlaid Service Enhanced VAN network overlay Enhanced VAN network overlay Enhanced VAN HYP HYP HYP Standard IP Physical serving as the Underlay 17 IBM 2010

Abstracted s Introduce Multiple Research Challenges I/O Performance Routing Traffic shaping QoS Multi-pathing Scalability Management (FCAPS) Resiliency Security aspects of Placement Federation of Clouds lication Data M1 M5 Route Table for Green Interface Identifier (vmac) 1 1 2 3 4 5 6 M1 Red M2 7 8 M1 M6 Auto discovery 3 M5 M9 M2 M3 M2 protocols Gr Ye Red Gr Ye Pu 0 A E.g. IP B 2 Trusted Domain lication Data M1 M5 Red A B 18 IBM 2010

Reservoir Federated Clouds Cloud providers cannot be expected to coordinate their network maintenance, network topologies, etc. with each other. A research into Federated Clouds suggests meeting these requirements by separating clouds using VAN proxies, which acts as gateways between clouds. A VAN proxy hides the internal structure of the cloud from other clouds in a federation. The VAN proxies of different clouds communicate to ensure that VANs can extend across a cloud boundary while adhering to the privacy and security limitations of its members Back 19 IBM 2010

Packet flow today (in K) QEMU QEMU application application Socket Interface Socket Interface Kernel Stack Kernel Stack Adapter Driver VIRTIO Frontend Emulated Adapter VIRTIO Backend Kernel TAP Interface Services (E.g. Bridge or VAN central services) Interface 20 IBM 2010 TAP

Pseudo-Wires Between s Results in a Dual Stack QEMU application Socket Interface QEMU application Socket Interface. Kernel Stack Kernel Stack Stack Adapter Driver VIRTIO Frontend Driver Emulated Adapter VIRTIO Backend (Glue) Traffic Encapsulation Traffic Encapsulation Socket Interface Abstraction Kernel Stack Stack Net Driver Driver 21 IBM 2010

Performance Path from guest to wire is long Latencies are manifested in the form of: Packet copies exits and entries User/Kernel mode switches QEMU process scheduling Performance Aspects Increase Packet Size Inhibit Checksum CPU Affinity Flow Control 22 IBM 2010

Increase Packet Size Large Packets Transport and layers capable of up to 64KB packets Ethernet limit is 1500 bytes but there is no Ethernet wire between guest and host! Set MTU to 64KB in guest Flow lication writes 64KB to TCP socket TCP, IP check MTU (=64KB) and create 1 TCP segment, 1 IP packet virtual NIC driver copies entire 64KB frame to host writes 64KB frame into UDP socket stack creates 1 64KB UDP packet If packet destination = on local host Transfer 64KB packet directly on the loopback interface If packet destination = other host NIC segments 64KB packet in hardware Back 23 IBM 2010

Inhibit Checksums to packets represent inner buffer transfer Inhibit TCP/UDP checksum calculation and verification to packets are protected by lower stack checksum Inhibit TCP/UDP checksum calculation and verification Back 24 IBM 2010

CPU affinity and pinning QEMU process contains 2 threads CPU thread (actually, one CPU thread per guest vcpu) IO thread Linux process scheduler selects core(s) to run threads on Many times scheduler made wrong decisions Schedule both on same core Constantly reschedule (core 0 -> 1 -> 0 -> 1 -> ) Solution/workaround pin CPU thread to core 0, IO thread to core 1 Back 25 IBM 2010

Flow control does not anticipate flow control at Layer-2 Thus, host should not provide flow control Otherwise, bad effects similar to TCP-in-TCP encapsulation will happen Lacking flow control, host should have large enough socket buffers Example: uses TCP buffers should be at least guest TCP s bandwidth x delay Back 26 IBM 2010

Performance results Throughput Receiver CPU Utilization Back 27 IBM 2010