Linux NIC and iscsi Performance over 40GbE



Similar documents
Building Enterprise-Class Storage Using 40GbE

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Demartek June Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment

Doubling the I/O Performance of VMware vsphere 4.1

How To Evaluate Netapp Ethernet Storage System For A Test Drive

QoS & Traffic Management

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

Certification Document macle GmbH GRAFENTHAL R2208 S2 01/04/2016. macle GmbH GRAFENTHAL R2208 S2 Storage system

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

Interoperability Testing and iwarp Performance. Whitepaper

Certification Document bluechip STORAGEline R54300s NAS-Server 03/06/2014. bluechip STORAGEline R54300s NAS-Server system

Certification Document macle GmbH Grafenthal-S1212M 24/02/2015. macle GmbH Grafenthal-S1212M Storage system

VMware vsphere 4.1 Networking Performance

Test Report Newtech Supremacy II NAS 05/31/2012. Newtech Supremacy II NAS storage system

Oracle Database Scalability in VMware ESX VMware ESX 3.5

D1.2 Network Load Balancing

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

VMWARE WHITE PAPER 1

Evaluation of 40 Gigabit Ethernet technology for data servers

New!! - Higher performance for Windows and UNIX environments

White Paper. Recording Server Virtualization

Windows 8 SMB 2.2 File Sharing Performance

10GE NIC throughput measurements (Revised: ) S.Esenov

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

SMB Direct for SQL Server and Private Cloud

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Boosting Data Transfer with TCP Offload Engine Technology

Oracle Database 11g Release 2 Performance: Protocol Comparison Using Clustered Data ONTAP 8.1.1

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Pivot3 Reference Architecture for VMware View Version 1.03

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters

Introduction to MPIO, MCS, Trunking, and LACP

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Accelerating Microsoft Exchange Servers with I/O Caching

3G Converged-NICs A Platform for Server I/O to Converged Networks

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Configuring Your Computer and Network Adapters for Best Performance

Adobe LiveCycle Data Services 3 Performance Brief

Running Microsoft SQL Server 2012 on a Scale-Out File Server Cluster via SMB Direct Connection Solution Utilizing IBM System x Servers

Simple, Reliable Performance for iscsi Connectivity

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

1-Gigabit TCP Offload Engine

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Qsan Document - White Paper. Performance Monitor Case Studies

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Gigabit Ethernet Design

Very Large Enterprise Network, Deployment, Users

RoCE vs. iwarp Competitive Analysis

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Network Function Virtualization Packet Processing Performance of Virtualized Platforms with Linux* and Intel Architecture

SN A. Reference Guide Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Connecting the Clouds

State of the Art Cloud Infrastructure

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Exploration of Large Scale Virtual Networks. Open Network Summit 2016

Deployments and Tests in an iscsi SAN

TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

VXLAN Performance Evaluation on VMware vsphere 5.1

Michael Kagan.

Solving the Hypervisor Network I/O Bottleneck Solarflare Virtualization Acceleration

White Paper Solarflare High-Performance Computing (HPC) Applications

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Microsoft Exchange Server 2003 Deployment Considerations

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Performance Analysis of Large Receive Offload in a Xen Virtualized System

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance

Converged Networking Solution for Dell M-Series Blades. Spencer Wheelwright

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Networking Virtualization Using FPGAs

High Performance Packet Processing

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Transcription:

Linux NIC and iscsi Performance over 4GbE Chelsio T8-CR vs. Intel Fortville XL71 Executive Summary This paper presents NIC and iscsi performance results comparing Chelsio s T8-CR and Intel s latest XL71 Fortville server adapter running at 4Gbps. The results demonstrate that Chelsio s NIC provides consistently superior results, lower CPU utilization, higher throughput and drastically lower latency, with outstanding small I/O performance. The T8 notably delivers line rate 4Gbps at the I/O sizes that are more representative of actual application use. In iscsi tests, the same performance advantages are shown that make the T8 the best performing unified 4Gbps Ethernet adapter in the market. Overview The Terminator (T) ASIC from Chelsio Communications, Inc. is a fifth generation, highperformance 2x4Gbps/4x1Gbps server adapter engine with Unified Wire capability, enabling offload storage, compute and networking traffic to run simultaneously. T provides extensive support for stateless offload operation for both IPv4 and IPv6 (IP, TCP and UDP checksums, Large Send Offload, Large Receive Offload, Receive Side Steering/Load Balancing, and flexible line rate Filtering). Furthermore, T is a fully virtualized NIC engine with separate configuration and traffic management for 128 virtual interfaces, and includes an on-board switch that offloads the hypervisor v-switch. Thanks to integrated, standards based FCoE/iSCSI and RDMA offload, T based adapters are high performance drop in replacements for Fibre Channel storage adapters and InfiniBand RDMA adapters. Unlike other converged Ethernet adapters, the Chelsio T based NICs also excel at normal server adapter functionality, providing high packet processing rate, high throughput and low latency for common network applications. This paper pits the T8-CR against the latest Intel 4Gbps server adapter. The following sections start by comparing the two in server adapter (NIC) benchmarks, followed by iscsi storage performance, comparing the full offload iscsi support of the Chelsio adapter to the Intel adapter. Copyright 214. Chelsio Communications Inc. All rights reserved 1

64 128 26 384 12 768 124 248 496 13172 %CPU/ Gbps 64 128 26 384 12 768 124 248 496 13172 %CPU/ Gbps NIC Test Results The following graphs compare the dual port unidirectional and bidirectional throughput numbers and CPU usage per Gbps, obtained by varying the I/O sizes using the iperf tool. 12 6 1 8 4 6 4 2 3 2 1 IO Size (B) Chelsio %CPU Per Gbps Intel %CPU per Gbps Chelsio BW Intel BW Figure 1 Unidirectional Throughput and %CPU/Gbps vs. I/O size The above results reveal that Chelsio s adapter achieves significantly higher numbers throughout, reaching line rate unidirectional throughput at I/O size as small as 12B. The numbers also show noticeable CPU savings, indicative of a more efficient processing path. 9 8 7 6 4 3 2 1 1 9 8 7 6 4 3 2 1 IO Size (B) Chelsio %CPU per Gbps Intel %CPU per Gbps Chelsio BW Intel BW Figure 2 Bidirectional Throughput and %CPU/Gbps vs. I/O size The results show a smooth and stable performance curve for Chelsio, whereas Intel s numbers vary widely across the data points, perhaps indicative of performance corners. Copyright 214. Chelsio Communications Inc. All rights reserved 2

12 124 248 496 13172 24288 CPU Usage (%) Latency (us) The following graph compares the single port latency of the two adapters. 3 2 2 1 1 1 2 4 8 16 24 32 4 48 6 64 128 26 384 12 124 Chelsio IOSize (B) Intel Figure 3 Latency vs. I/O size The results clearly show Chelsio s advantage in latency, with a superior profile that remains flat across the range of study, whereas Intel s latency is significantly higher, placing it outside of the range of usability for low latency applications. iscsi Test Results The following graphs compare the single port iscsi READ and WRITE Throughput and IOPS numbers for the two adapters, obtained by varying the I/O sizes using the iometer tool. 12 1 8 6 4 2 4 3 3 2 2 1 1 IO Size(B) Chelsio CPU Intel CPU Chelsio Throughput Intel Throughput Figure 4 READ Throughput and %CPU vs. I/O size Copyright 214. Chelsio Communications Inc. All rights reserved 3

12 124 248 496 13172 24288 IOPS(Millions) 12 124 248 496 13172 24288 CPU Usage (%) The graph above shows Chelsio s T8-CR performance to be superior in both CPU utilization and throughput, reaching line rate at ¼ the I/O size needed for Intel s Fortville XL 71. 8 7 6 4 3 2 1 4 3 3 2 2 1 1 IO Size(B) Chelsio CPU Intel CPU Chelsio Throughput Intel Throughput Figure WRITE Throughput and %CPU vs. I/O size The results above clearly show that Chelsio s adapter provides higher efficiency, freeing up significant CPU cycles for actual application use. 3. 3 2. 2 1. 1. IO Size(B) Chelsio Read Intel Read Chelsio Write Intel Write Figure 6 READ and WRITE IOPs vs. I/O size Copyright 214. Chelsio Communications Inc. All rights reserved 4

The IOPS nubmers reflect the performance advantages of Chelsio s adapter, particuclarly at the challenging small I/O sizes that are more representative of actual application requirements. Test Configuration The following sections provide the test setup and configuration details. NIC Topology Server running RHEL 6. (3.6.11) 4G 4G Client running RHEL 6. (3.6.11) Figure 7 Simple Back-to-Back Test Topology Network Configuration The NIC setup consists of 2 machines connected back-to-back using two ports: a Server and Client, each configured with Intel Xeon CPU E-2687W v2 8-core processor running at 3.4GHz (HT enabled) and 128 GB of RAM. RHEL 6. (3.6.11 Kernel) operating system was installed on both machines. Standard MTU of 1B was used. The Chelsio setup used a T8-CR adapter in each system with Chelsio network driver v2.11.. whereas the Intel setup used a XL71 adapter in each system with Intel network driver v1..1 iscsi Topology iscsi Initiators with T8-CR adapters running on Windows Server 212 R2 4 Gb 4 Gb 4 Gb 4 Gb 4 Gb 4 Gb Switch iscsi Target running on RHEL 6. (3.6.11) Figure 8 iscsi Target Connected to 4 Initiators Using a 4Gb Switch Copyright 214. Chelsio Communications Inc. All rights reserved

Storage Topology and Configuration The iscsi setup consisted of a target storage array connected to 4 iscsi initiator machines through a 4Gb switch using single port on each system. Standard MTU of 1B was used. The storage array was configured with two Intel Xeon CPU E-2687W v2 8-core processors running at 3.4GHz (HT enabled) and 64 GB of RAM. Chelsio s iscsi target driver v.2.-8 was installed with RHEL 6. (3.6.11 Kernel) operating system. The Chelsio setup was configured in ULP mode with CRC enabled using T8-CR adapter. The Intel setup was configured in AUTO mode with CRC enabled using XL71 adapter. The initiator machines were each setup with an Intel Xeon CPU E-2687W v2 8-core processors running at 3.4GHz (HT enabled) and 128 GB of RAM. T8-CR adapter was installed in each system with Windows MS Initiator, Unified Wire Software v...33 and Windows 212 R2 operating system. The storage array contains 32 iscsi ramdisk null-rw targets. Each of the 4 initiators connects to 8 targets. I/O Benchmarking Configuration iometer was used to assess the storage capacity of a configuration. The I/O sizes used varied from 12B to 12KB with an I/O access pattern of random READs and WRITEs. Iperf was used to measure network throughput. This test used sample IO sizes varying from 64B to 26KB. Netperf was used to measure the network latency. This test used sample IO sizes varying from 1B to 1KB. Parameters passed to Iometer dynamo.exe l remote_iometer_ip m localmachine ip //Add it for all initiators. 3 outstanding IO per Target. 16 worker threads. Commands Used For all the tests, adaptive-rx was enabled on all Chelsio interfaces using the following command: [root@host]# ethtool C ethx adaptive-rx on Additionally, the following system wide settings were made: [root@host]# sysctl -w net.ipv4.tcp_timestamps= [root@host]# sysctl -w net.ipv4.tcp_sack= [root@host]# sysctl -w net.ipv4.tcp_low_latency=1 [root@host]# sysctl -w net.core.netdev_max_backlog=2 [root@host]# sysctl -w net.core.rmem_max=16777216 Copyright 214. Chelsio Communications Inc. All rights reserved 6

[root@host]# sysctl -w net.core.wmem_max=16777216 [root@host]# sysctl -w net.core.rmem_default=16777216 [root@host]# sysctl -w net.core.wmem_default=16777216 [root@host]# sysctl -w net.core.optmem_max=16777216 [root@host]# sysctl -w net.ipv4.tcp_rmem='496 8738 16777216' [root@host]# sysctl -w net.ipv4.tcp_wmem='496 16777216' Throughput test: On the Server: [root@host]# iperf -s -p <port> -w 12k On the Client: [root@host]# iperf -c <Server IP> -p <port> -l <IO Size> -t 3 P 8 w 12k Latency test: On the Server: [root@host]# netserver On the Client: [root@host]# netperf H <server IP> -t TCP_RR l 3 -- -r <IO Size>,<IO Size> Conclusions This paper compared performance results of Chelsio s T8-CR and Intel s XL71 server adapter in Linux. The benchmark results demonstrate that Chelsio s T-based adapter delivers: Significantly superior throughput, with line-rate 4G performance with both unidirectional and bidirectional traffic, whereas the Intel adapter fails to saturate the wire. Drastically lower latency compared to the Intel adapter making it the ideal choice for low latency applications. Consistently better CPU utilization than the Intel adapter, freeing up the CPU for user applications. Higher performance and higher efficiency networking using the iscsi protocol, thanks to ISCSI offload in hardware. The results thus show that the T-based adapters provide a unique combination of a complete suite of networking and storage protocols with high performance and high efficiency operation, making them great all-around unified wire adapters. Related Links The Chelsio Terminator ASIC iscsi at 4Gbps 4Gb TOE vs NIC Performance Linux 1GbE NIC/iSCSI Performance Copyright 214. Chelsio Communications Inc. All rights reserved 7