VXLAN Performance Evaluation on VMware vsphere 5.1



Similar documents
Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

VMware vsphere 4.1 Networking Performance

Voice over IP (VoIP) Performance Evaluation on VMware vsphere 5

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1

Kronos Workforce Central on VMware Virtual Infrastructure

Deploying Extremely Latency-Sensitive Applications in VMware vsphere 5.5

VMWARE WHITE PAPER 1

vsphere Networking ESXi 5.0 vcenter Server 5.0 EN

Improving Scalability for Citrix Presentation Server

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

ConnectX -3 Pro: Solving the NVGRE Performance Challenge

White Paper. Advanced Server Network Virtualization (NV) Acceleration for VXLAN

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Microsoft Exchange Server 2007

VMware ESX Server Q VLAN Solutions W H I T E P A P E R

Configuration Maximums VMware vsphere 4.0

vsphere Networking vsphere 6.0 ESXi 6.0 vcenter Server 6.0 EN

Configuration Maximums

Microsegmentation Using NSX Distributed Firewall: Getting Started

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Performance and Recommended Use of AB545A 4-Port Gigabit Ethernet Cards

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Configuration Maximums VMware vsphere 4.1

VMware Virtual SAN Network Design Guide TECHNICAL WHITE PAPER

How to Create a Simple Content Management Solution with Joomla! in a vcloud Environment. A VMware Cloud Evaluation Reference Document

How to Use a LAMP Stack on vcloud for Optimal PHP Application Performance. A VMware Cloud Evaluation Reference Document

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

vsphere Networking vsphere 5.5 ESXi 5.5 vcenter Server 5.5 EN

What s New in VMware vsphere 5.5 Networking

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Configuration Maximums

VMware vcenter Update Manager Performance and Best Practices VMware vcenter Update Manager 4.0

How to Create an Enterprise Content Management Solution Based on Alfresco in a vcloud Environment. A VMware Cloud Evaluation Reference Document

How to Create a Flexible CRM Solution Based on SugarCRM in a vcloud Environment. A VMware Cloud Evaluation Reference Document

RoCE vs. iwarp Competitive Analysis

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Demartek June Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment

What s New in VMware vsphere 5.0 Networking TECHNICAL MARKETING DOCUMENTATION

How to Create a Multi-user Content Management Platform with Drupal in a vcloud Environment. A VMware Cloud Evaluation Reference Document

vcloud Networking and Security Sales and Partner Use Only What is the VMware vcloud Networking and Security Product?

VXLAN: Scaling Data Center Capacity. White Paper

Understanding Data Locality in VMware Virtual SAN

Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters

D1.2 Network Load Balancing

Configuration Maximums VMware Infrastructure 3

Windows Server 2008 R2 Hyper-V Live Migration

Mobile Secure Desktop Maximum Scalability, Security and Availability for View with F5 Networks HOW-TO GUIDE

Esri ArcGIS Server 10 for VMware Infrastructure

VXLAN Overlay Networks: Enabling Network Scalability for a Cloud Infrastructure

Extending Networking to Fit the Cloud

Microsoft Exchange Solutions on VMware

VMware vsphere 4.1. Pricing, Packaging and Licensing Overview. E f f e c t i v e A u g u s t 1, W H I T E P A P E R

Performance of Virtualized SQL Server Based VMware vcenter Database

VMware EVO SDDC. General. Q. Is VMware selling and supporting hardware for EVO SDDC?

JUNIPER NETWORKS FIREFLY HOST FIREWALL PERFORMANCE

VMware vshield App Design Guide TECHNICAL WHITE PAPER

What s New in VMware Site Recovery Manager 6.1

Boosting Data Transfer with TCP Offload Engine Technology

Management of VMware ESXi. on HP ProLiant Servers

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

vrealize Business System Requirements Guide

VMware vcloud Air Security TECHNICAL WHITE PAPER

Intel I340 Ethernet Dual Port and Quad Port Server Adapters for System x Product Guide

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Installing and Configuring vcenter Multi-Hypervisor Manager

QuickStart Guide vcenter Server Heartbeat 5.5 Update 2

Using Network Virtualization to Scale Data Centers

VMware vcenter Update Manager Administration Guide

Virtual Network Exceleration OCe14000 Ethernet Network Adapters

Dialogic PowerMedia XMS

BUILDING A NEXT-GENERATION DATA CENTER

Evaluation of Enterprise Data Protection using SEP Software

The VMware Reference Architecture for Stateless Virtual Desktops with VMware View 4.5

VMware vcloud Automation Center 6.0

IT Business Management System Requirements Guide

Doubling the I/O Performance of VMware vsphere 4.1

AlwaysOn Desktop Implementation with Pivot3 HOW-TO GUIDE

Quantum Hyper- V plugin

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

On-Demand Call Center with VMware View

Citrix XenApp Server Deployment on VMware ESX at a Large Multi-National Insurance Company

How To Use A Vmware View For A Patient Care System

VMware vcloud Networking and Security Overview

VMware vsphere 5.0 Evaluation Guide

Scaling in a Hypervisor Environment

Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

VMware vcloud Air. Enterprise IT Hybrid Data Center TECHNICAL MARKETING DOCUMENTATION

1-Gigabit TCP Offload Engine

High Performance Packet Processing

W H I T E P A P E R. Optimized Backup and Recovery for VMware Infrastructure with EMC Avamar

vcloud Automation Center Support Matrix vcloud Automation Center 5.1

Top 10 Reasons to Virtualize VMware Zimbra Collaboration Server with VMware vsphere. white PAPER

ACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist

Transcription:

VXLAN Performance Evaluation on VMware vsphere 5.1 Performance Study TECHNICAL WHITEPAPER

Table of Contents Introduction... 3 VXLAN Performance Considerations... 3 Test Configuration... 4 Results... 5 Throughput for Large and Small Message Sizes... 5 CPU Overhead for Large and Small Message Sizes... 6 Throughput and CPU Utilization for 16 Virtual Machines, Various Message Sizes... 7 Conclusion... 8 TECHNICAL WHITE PAPER /2

Introduction Cloud computing services have experienced rapid growth over the past few years because they can keep costs down by allowing multiple tenants to share system resources. One requirement of making this multiple tenancy possible is to provide each tenant with network isolation. Segmenting the traffic using VLAN is a typical solution to this problem. However, service providers also need to keep up with customer demand by being able to move workloads to those servers that have spare resources. To do so, network traffic needs to be encapsulated so that the workload is not tied to the underlying hardware. This can be a problem because the networking architecture ties the workloads to the underlying hardware, which restricts the movements of these workloads and limits where these workloads can be placed. In addition, segmenting the physical LAN using VLANs does not scale beyond a certain limit. Virtual extensible LAN (VXLAN) is a network encapsulation mechanism that enables virtual machines to be deployed on any physical host, regardless of the host s network configuration. It solves the problems of mobility and scalability in two ways: It uses MAC in UDP encapsulation, which allows the virtual machine to communicate using an overlay network that spans across multiple physical networks. It decouples the virtual machine from the underlying network thereby allowing the virtual machine to move across the network without reconfiguring the network. VXLAN uses a 24-bit identifier, which means that a single network can support up to 16 million LAN segments. This number is much higher than the 4,94 (limit imposed by the IEEE 82.1Q VLAN specification. Since VXLAN is an additional encapsulation mechanism introduced at the hypervisor layer, there are certain performance implications. This paper demonstrates that the performance of VXLAN on vsphere 5.1 is very close to a configuration without VXLAN, and vsphere 5.1 with VXLAN configured can meet the demands of today s network-intensive applications. We used industry-standard benchmarks to conduct our experiments that demonstrate: A virtual machine configured with VXLAN achieved similar networking performance to a virtual machine without VXLAN configured, both in terms of throughput and CPU cost. vsphere 5.1 scales well as we add more virtual machines on the VXLAN network. VXLAN Performance Considerations VXLAN introduces an additional layer of packet processing at the hypervisor level. For each packet on the VXLAN network, the hypervisor needs to add protocol headers on the sender side (encapsulation) and remove these headers (decapsulation) on the receiver side. This causes the CPU additional work for each packet. Apart from this CPU overhead, some of the offload capabilities of the NIC cannot be used because the inner packet is no longer accessible. The physical NIC hardware offload capabilities (for example, checksum offloading and TCP segmentation offload (TSO)) have been designed for standard (non-encapsulated) packet headers, and some of these capabilities cannot be used for encapsulated packets. In such a case, a VXLAN enabled packet will require CPU resources to perform a task that otherwise would have been done more efficiently by physical NIC hardware. There are certain NIC offload capabilities that can be used with VXLAN, but they depend on the physical NIC and the driver being used. As a result, the performance may vary based on the hardware used when VXLAN is configured. TECHNICAL WHITE PAPER /3

Test Configuration The test configuration (shown in Table 1 and Figure 1) consisted of two identical HP ProLiant DL38 G7 machines running vsphere 5.1. Both systems ran the same number of virtual machines for the tests. The physical NIC we chose assists with Tx CKO/TSO of encapsulated packets. SERVER CLIENT Make HP ProLiant DL38 G7 HP ProLiant DL38 G7 CPU 2X Intel Xeon X569 @ 3.47GHz, 6 cores per socket 2X Intel Xeon X569 @ 3.47GHz, 6 cores per socket RAM 64GB 64GB NICs Intel Corporation 82599EB 1-Gigabit SFI/SFP+ Network Connection Intel Corporation 82599EB 1-Gigabit SFI/SFP+ Network Connection Table 1. Test configuration for vsphere host and client machine Figure 1. Test-bed setup We used netperf to generate TCP traffic between the vsphere host virtual machines and the client virtual machines. Netperf is a light user-level process that is widely used for networking measurements. It consists of two binaries: netperf a user-level process that connects to the server and generates traffic netserver a user-level process that listens and accepts connection requests We installed each virtual machine with 64-bit Red Hat Enterprise Linux 6 and configured each with one virtual CPU and 2GB RAM. Each virtual machine was configured to use vmxnet3 (driver version 1..29.-NAPI) as the virtual NIC. We configured the same number of sender and receiver virtual machines for each test. Each virtual machine was sending to its corresponding receiver virtual machine. Since a single netperf session was not enough for achieving line rate at 1Gbps, we ran five sessions per virtual machine. To maintain consistency, we used the same number of sessions per virtual machine as we added more virtual machines to our test. TECHNICAL WHITE PAPER /4

We measured the performance impact of VXLAN for both large and small messages being sent from 1 to 16 virtual machines. For each configuration, we ran the tests with and without VXLAN configured. We then measured the CPU utilization of the vsphere 5.1 host and the combined throughput for all netperf sessions. To measure small packet size performance, we used TCP_NODELAY as a socket option. TCP_NODELAY allows the sender to send packets immediately, and the TCP/IP stack of the guest operating system cannot aggregate small packets into large packets. As a result, we get packet sizes which are very close to the message size. Since VXLAN adds data to the standard MTU frames, we increased the MTU size for the physical NICs to 16. We used INTEL NICs because they can be used to do checksum offloads for encapsulated packets. We also enabled RSS for the Intel physical NIC so that traffic could be distributed among various queues for VXLAN traffic. RSS is only present for Intel 1Gb adapters on vsphere 5.1, and can be enabled by unloading and loading the module with vmkload_mod ixgbe RSS= 4 for each 1Gb Intel adapter on the server. For the default configuration we used the standard MTU and default ixgbe driver (no RSS configured). Results The test results demonstrate the following: Network throughput of a virtual machine configured with VXLAN is very close to that of a default (no VXLAN) configuration. For small size packets, throughput scales linearly as more virtual machines are added. The CPU overhead for VXLAN depends upon the packet size. CPU overhead refers to the extra CPU utilized by VXLAN when compared to the system without VXLAN configured. For the large message test case, there is almost no CPU overhead. For the small message test cases, the overhead is about 5% of the overall system CPU utilization. Throughput for Large and Small Message Sizes We used a 64K socket size and a 16K message size to evaluate large packet size performance. For the small message size we used an 8K socket size and a 128 byte message size. We then compared the performance with the default configuration of no VXLAN. Send Throughput for SocketSize=64K MsgSize=16K Throughput (Gbps) 1 8 6 4 2 1 VM 2 VM 4 VM 8 VM 16 VM Number of VMs Default VXLAN Figure 2. Multi-VM large message size throughput Figure 2 shows that VXLAN can reach line rate for all configurations with a large socket size message size configuration. The networking throughput reported by VXLAN is slightly lower than the default because of additional protocol overhead. TECHNICAL WHITE PAPER /5

Send Throughput for SocketSize=8K MsgSize=128b 2.5 Throughput (Gbps) 2. 1.5 1..5 Default VXLAN. 1 VM 2 VM 4 VM 8VM 16 VM Number of VMs Figure 3. Multi-VM small message size throughput For the small message size (Figure 3) we observed that throughput with VXLAN was better than or comparable to the default for all test case scenarios. Since VXLAN was configured to use RSS, each virtual machine, when configured with VXLAN, can use all hardware queues present. The default configuration, on the other hand, allows only one queue per virtual machine. As a result, VXLAN throughput was slightly higher. CPU Overhead for Large and Small Message Sizes For the same tests that measured throughput, we also collected CPU utilization on the vsphere 5.1 host. We then measured the additional CPU resources utilized when VXLAN was configured and reported it as the CPU overhead of using VXLAN. Since the throughput between VXLAN and default test cases was different, we compared the percentage of additional CPU used on a single core for every 1Gbps of traffic and reported the difference as the CPU overhead of using VXLAN. CPU Overhead for Socket Size=64K MsgSize=16K Additional % Core Util 12 1 8 6 4 2 1 VM 2 VM 4 VM 8 VM 16 VM Send Recv Number of VMs Figure 4. Multi-VM large message CPU overhead on a single core For the large message transmit case (Figure 4), we saw that CPU overhead for VXLAN was negligible for almost all test cases. The maximum overhead we observed was close to 2% of a core for 1Gbps of traffic, and, as we added more virtual machines, the system overhead of using VXLAN remained constant. For a 12 core system, the overhead is less than.2% of the overall system utilization. TECHNICAL WHITE PAPER /6

The CPU overhead for receiving large messages is higher for VXLAN because checksum offload capabilities of the NIC cannot be used. The overhead is highest for the 8VM test case (close to 1% of a single core), and most of the overhead is due to checksum computation being done in the software. CPU Overhead for SocketSize=8K MsgSize=128b Additional % Core Util 1 8 6 4 2 1 VM 2 VM 4 VM 8VM 16 VM Number of VMs Send Recv Figure 5. Multi-VM small message size CPU overhead on a single core For the small message size (Figure 5), we saw that the CPU overhead for VXLAN was about 6% of a single core for both send and receive test cases. For a 12 core system, the overhead is approximately 5% of the overall system utilization. The overhead also remained constant as we added more virtual machines. The overhead is higher for the small message case because for every 1Gbps of traffic, vsphere 5.1 has to process higher packet rates and each packet requires encapsulation/decapsulation to be done in software. Throughput and CPU Utilization for 16 Virtual Machines, Various Message Sizes To measure the impact of variable packet size with VXLAN, we configured 16 virtual machines to use different socket and message sizes with netperf. We then measured the throughput and CPU utilization for the vsphere 5.1 host and compared it against the default configuration. Impact of Packet Size on Throughput (16VM) Throughput (Gbps) 1 8 6 4 2 8K 128b 8K 512b 8K 124b 8K 14b 64K 16K Netperf Configuration (Socket size - Message size) Default VXLAN Figure 6. 16-VM various message size throughput For the various message size throughput scenarios for 16 virtual machines (Figure 6), we noticed that throughput was comparable for all test cases. We were able to achieve line rate with a 124 byte message size for both VXLAN and the default configuration. TECHNICAL WHITE PAPER /7

Impact of Packet Size on CPU Overhead (16VM) Additional % Core Util 7 6 5 4 3 2 1 8K 128b 8K 512b 8K 124b 8K 14b 64K 16K Netperf Configuration (SocketSize - Message Size) Recv Send Figure 7. 16-VM CPU overhead on a single core with VXLAN For the 16-VM test cases with varying message sizes (Figure 7), we observed that the CPU overhead was highest for the128-byte message cases (at 6% of a single core). It decreased substantially as we increased the message size and was less than 1% for the large message cases. The overall system overhead of using VXLAN for a 12 core system decreased from 5% to less than 1% as we increased the message size from 128 byte to 16K. Conclusion This paper evaluated VXLAN performance to demonstrate that vsphere 5.1 is readily able to host networkintensive applications on VXLAN with good performance and scalability. We compared our performance to standard virtual switch configurations and observed that throughput with VXLAN configured is comparable to throughput without VXLAN. For a small message size, throughput scaled linearly as we added more virtual machines. For large message sizes, we were able to achieve line rate for all configurations. There is some CPU overhead due to the use of VXLAN. This overhead varies based on the configuration and is due to lack of hardware offloads for encapsulated packets. After NIC vendors start supporting these offloads, this overhead may reduce significantly. TECHNICAL WHITE PAPER /8

About the Author Rishi Mehta is a Senior Member of Technical Staff in the Performance Engineering group at VMware. His work focuses on improving network performance of VMware s virtualization products. He has a Master s in Computer Science from North Carolina State University. Acknowledgements The author would like to thank Jianjun Shen, Julie Brodeur, Lenin Singaravelu, Shilpi Agarwal, Sreekanth Setty, and Suds Jain (in alphabetical order) for reviews and contributions to the paper. VMware, Inc. 341 Hillview Avenue Palo Alto CA 9434 USA Tel 877-486-9273 Fax 65-427-51 www.vmware.com Copyright 213 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item: EN-99-1 Date: 3-Jan-13 Comments on this document: docfeedback@vmware.com