Building Enterprise-Class Storage Using 40GbE



Similar documents
Linux NIC and iscsi Performance over 40GbE

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

3G Converged-NICs A Platform for Server I/O to Converged Networks

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

SMB Direct for SQL Server and Private Cloud

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Converging Data Center Applications onto a Single 10Gb/s Ethernet Network

How To Evaluate Netapp Ethernet Storage System For A Test Drive

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Cut I/O Power and Cost while Boosting Blade Server Performance

Doubling the I/O Performance of VMware vsphere 4.1

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Windows 8 SMB 2.2 File Sharing Performance

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09

Overview and Frequently Asked Questions Sun Storage 10GbE FCoE PCIe CNA

Block based, file-based, combination. Component based, solution based

RoCE vs. iwarp Competitive Analysis

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Converged Networking Solution for Dell M-Series Blades. Spencer Wheelwright

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

FIBRE CHANNEL OVER ETHERNET

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

Connecting the Clouds

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Demartek June Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment

High-Performance Networking for Optimized Hadoop Deployments

FCoE Deployment in a Virtualized Data Center

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Fibre Channel Over and Under

State of the Art Cloud Infrastructure

Microsoft SMB Running Over RDMA in Windows Server 8

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Zadara Storage Cloud A

Michael Kagan.

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Virtualization of the MS Exchange Server Environment

Unified Storage Networking

Interoperability Testing and iwarp Performance. Whitepaper

MS Exchange Server Acceleration

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

Building a Scalable Storage with InfiniBand

over Ethernet (FCoE) Dennis Martin President, Demartek

Mellanox Accelerated Storage Solutions

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

Intel s Vision for Cloud Computing

Deployments and Tests in an iscsi SAN

Enabling High performance Big Data platform with RDMA

Solution Brief Network Design Considerations to Enable the Benefits of Flash Storage

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Performance Analysis: Scale-Out File Server Cluster with Windows Server 2012 R2 Date: December 2014 Author: Mike Leone, ESG Lab Analyst

Integrating 16Gb Fibre Channel and 10GbE Technologies with Current Infrastructure

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Romley/Sandy Bridge Server I/O Solutions By Seamus Crehan Crehan Research, Inc. March 2012

High Speed I/O Server Computing with InfiniBand

A Tour of the Linux OpenFabrics Stack

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Fibre Channel over Ethernet in the Data Center: An Introduction

Storage Networking Foundations Certification Workshop

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

Running Microsoft SQL Server 2012 on a Scale-Out File Server Cluster via SMB Direct Connection Solution Utilizing IBM System x Servers

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Interoperability Test Results for Juniper Networks EX Series Ethernet Switches and NetApp Storage Systems

Replacing SAN with High Performance Windows Share over a Converged Network

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

White Paper Solarflare High-Performance Computing (HPC) Applications

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

Brocade One Data Center Cloud-Optimized Networks

Optimizing Large Arrays with StoneFly Storage Concentrators

WHITE PAPER. Best Practices in Deploying Converged Data Centers

D1.2 Network Load Balancing

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Mellanox Academy Online Training (E-learning)

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

QoS & Traffic Management

Unified Fabric: Cisco's Innovation for Data Center Networks

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

THE CLOUD STORAGE ARGUMENT

Driving Revenue Growth Across Multiple Markets

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

Steven Hill January, 2010

ethernet alliance Data Center Bridging Version 1.0 November 2008 Authors: Steve Garrison, Force10 Networks Val Oliva, Foundry Networks

PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based

New Data Center architecture

Emulex Networking and Converged Networking Adapters for ThinkServer Product Guide

Virtual Network Exceleration OCe14000 Ethernet Network Adapters

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Best Practices Guide: Network Convergence with Emulex LP21000 CNA & VMware ESX Server

Integration of Microsoft Hyper-V and Coraid Ethernet SAN Storage. White Paper

Transcription:

Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance advantage across the full range of storage networking protocols including iscsi, iscsi over RDMA (iser), Windows Server 2012 SMB 3.0, Lustre over RDMA, NFS over RDMA and FCoE. Benchmarking results show T5 to be an ideal fit for high-performance storage networking solutions across the range of storage protocols. Furthermore, thanks to using the routable and reliable TCP/IP as a foundation, T5 allows highly scalable and cost effective installations using regular Ethernet switches. Internet SCSI (iscsi) testing demonstrates that Chelsio T5 adapters provide consistently superior results, lower CPU utilization, higher throughput and drastically lower latency, with outstanding small I/O performance compared to Intel s XL710 Fortville server adapter running at 40Gbps. iscsi over RDMA testing demonstrates that iser over T5 iwarp (RDMA/TCP) achieves consistently superior results in throughput, IOPS and CPU utilization in comparison to InfiniBand-FDR. Unlike IB, iwarp provides RDMA transport over standard Ethernet gear, with no special configuration needed or added management costs. Similarly, a performance comparison of Chelsio s T580-CR RDMA-enabled adapter and Intel s XL710 40GbE server adapter when used with SMB 3.0 file storage protocol in Windows Server 2012 R2 shows T580 exhibiting a better performance profile in both NIC and RDMA operation, with the results clearly proving that when RDMA is in use, significant performance and efficiency gains are obtained for SMB. A comparison of Lustre RDMA and NFS over RDMA using T5 40GbE iwarp and InfiniBand-FDR shows nearly identical performance. Unlike IB, iwarp provides a high-performance RDMA transport over standard Ethernet gear, with no special configuration needed or added management costs. Thanks to its hardware TCP/IP foundation, it provides low latency and all the benefits of RDMA, with routability to scale to large clusters and long distances. As a result, iwarp can be considered a no-compromise, drop-in Ethernet replacement for IB. FCoE performance results for T5 show that a T5-enabled FCoE target reaching nearly 4M IOPS and full 40G line-rate throughput at I/O sizes as low as 2KB. As a result, T5 can be considered the industry s first high-performance full FCoE offload solution. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 1

Introduction At a time of exponential data growth, data center storage systems are being pushed to their performance limits and capacity capabilities. With a predominant level of enterprise data stored in centralized file and block-based storage arrays, networking performance between servers and storage systems has become critical to overall data center performance and efficiency. Growing acceptance of hybrid and all-flash arrays has put even more pressure on this connection, as systems with built-in solid-state drives (SSDs) can pump enough data to saturate traditional network connections many times over. All of this has merged to make the network between storage systems and servers the main bottleneck in modern data centers. However, 40 Gbps Ethernet (40GbE) hardware-accelerated storage networking protocols, such as iscsi offload, iwarp (RDMA/TCP) and TCP/IP Offload Engine (TOE) can alleviate this bottleneck, while also allowing the cost-of-ownership advantages of pervasive Ethernet networking. 40GbE storage protocol offload technologies allow drastically lower data transfer latency, significantly higher CPU and overall storage system efficiency. Additionally, both 40GbE storage offload protocols provide the best network bandwidth, IOPS, and latency available today, allowing hybrid and all-flash arrays to deliver their greatest performance impact. Terminator 5 ASIC The Terminator 5 (T5) ASIC from Chelsio Communications, Inc. is a fifth generation, highperformance 2x40Gbps unified wire engine which offers storage protocol offload capability for accelerating both block (iscsi, FCoE) and file (SMB, NFS) level storage traffic. Furthermore, T5 storage protocol support is part of a complete, fully virtualized unified wire offload suite that includes FCoE, RDMA over Ethernet, TCP and UDP sockets and user space I/O. By leveraging Chelsio s proven TCP Offload Engine (TOE), 40GbE offloaded iscsi over T5 enjoys a distinct performance advantage over a regular L2 NIC. Unlike FC and FCoE, 40GbE iscsi runs over regular Ethernet infrastructure, without the need for specialized FC fabrics, or expensive DCB enabled switches and Fibre Channel Forwarder switches. By using IP, it is also routable over the WAN and scalable beyond a local area environment. Finally, the TCP transport allows reliable operation over any link types, including naturally lossy media such as wireless. T5 also includes enhanced data integrity protection for all protocols, and particularly so for storage traffic, including full end-to-end T10-DIX support for both 40GbE iscsi and FCoE, as well as internal data path CRC and ECC-protected memory. Finally, thanks to integrated, standards based FCoE/iSCSI and RDMA offload, T5 40GbE-based adapters are high-performance drop-in replacements for Fibre Channel storage adapters and InfiniBand RDMA adapters. In addition, unlike other converged Ethernet adapters, the Chelsio T5 based 40GbE NICs also excel at normal server adapter functionality, providing high packet processing rate, high throughput and low latency for common network applications. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 2

40GbE iscsi 1 This section summarizes iscsi performance results for Terminator 5 (T5) ASIC running over a 40Gbps Ethernet interface. Using a T5-enabled iscsi target, the benchmarking results show 40Gb line-rate performance using standard Ethernet frames with I/O sizes as small as 2KB, and more than 3M IOPS for 512B and 1KB I/O sizes. These results clearly show iscsi to be an ideal fit for very high-performance storage networking solutions. Furthermore, thanks to using the routable and reliable TCP/IP as a foundation, iscsi allows highly scalable and cost effective installations using regular Ethernet switches. The following graph plots the performance results, showing how line-rate 40Gbps can be achieved even at I/O sizes as small as 2KB for unidirectional transfer and 4KB for bidirectional transfer. Figure 1 iscsi Throughput and IOPS Chelsio T580-CR vs. Intel Fortville XL710 2 This section presents iscsi performance results comparing Chelsio s T580-CR and Intel s latest XL710 Fortville server adapter running at 40Gbps. The iscsi testing demonstrates that Chelsio T580-CR provides consistently superior results, lower CPU utilization, higher throughput and drastically lower latency 2, with outstanding small I/O performance. The T580 notably delivers line-rate 40Gbps at the I/O sizes that are more representative of actual application use. 1 These results are excerpted from Chelsio white paper iscsi at 40Gbps, which details benchmark testing results and test hardware/software configuration. 2 This section is excerpted from Chelsio white paper Linux NIC and iscsi Performance over 40GbE which details benchmark testing, including throughput, CPU utilization, latency and IOPS performance, and test hardware/software configuration. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 3

The following graphs compare the single port iscsi READ and WRITE Throughput and CPU Utilization numbers for the two adapters respectively, obtained by varying the I/O sizes using the iometer tool. The first graph shows Chelsio s T580-CR performance to be superior in both CPU utilization and throughput, reaching line-rate at ¼ the I/O size needed for Intel s Fortville XL 710. The second graph shows that Chelsio s adapter provides higher efficiency, freeing up significant CPU cycles for actual application use. Figure 2 READ Throughput and %CPU vs. I/O size Figure 3 WRITE Throughput and %CPU vs. I/O size Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 4

Linux iser Performance: 40GbE iwarp vs. InfiniBand FDR 3 The iscsi Extensions for RDMA (iser) protocol is a translation layer for operating iscsi over RDMA transports, such as iwarp/ethernet or InfiniBand. This section summarizes iser performance results, comparing iwarp RDMA over 40Gb Ethernet and FDR InfiniBand (IB). The results demonstrate that iser over Chelsio s iwarp RDMA adapters achieves consistently superior results in throughput, IOPS 3 and CPU utilization when compared to IB. Unlike IB, iwarp provides the RDMA transport over standard Ethernet gear, with no special configuration needed or added management costs. Thanks to its hardware offloaded TCP/IP foundation, iwarp provides the high-performance, low latency and efficiency benefits of RDMA, with routability to scale to large datacenters, clouds and long distances. The following graphs compare the unidirectional iser READ and WRITE throughput and CPU usage numbers of the Chelsio iwarp RDMA and Mellanox IB-FDR adapters, varying I/O sizes using the fio tool. The results cited below reveal that iser over iwarp RDMA achieves significantly higher numbers throughout, with outstanding small I/O performance. In addition, the READ results expose a bottleneck that appears to prevent the IB side from saturating the PCI bus, even at large I/O sizes. Figure 4 READ Throughput and %CPU/Gbps vs. I/O size (2 ports) 3 This section is excerpted from Chelsio white paper Linux iser Performance which details comparative iser benchmarking including throughput, IOPS, and CPU utilization dimensions. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 5

Figure 5 WRITE Throughput and %CPU/Gbps vs. I/O size (2 ports) Windows SMB 3.0 Performance at 40Gbps 4 A notable feature of Microsoft Windows Server 2012 R2 is the release of SMB 3.0 with SMB Direct (SMB over RDMA), which seamlessly leverages RDMA-enabled network adapters for improved storage bandwidth, latency and efficiency. One of the main advantages of the SMB 3.0 implementation is that once the RDMA-capable network adapter driver is installed, all its features are automatically enabled and made available to the SMB application. This section provides benchmark testing data that demonstrate the benefits of SMB Direct through a performance comparison of Chelsio s T580-CR RDMA-enabled adapter and Intel s XL710 40GbE server adapter. While the T580-CR exhibits a better performance profile in both NIC and RDMA operation, the results clearly show that when RDMA is in use, significant performance and efficiency gains are obtained. The following graphs compare the throughput and IOPS benchmark results for the two adapters at different I/O sizes. The results reveal that Chelsio s adapter provides significantly higher and more consistent performance, reaching line-rate unidirectional throughput at 8KB I/O size. The difference is particularly clear when RDMA kicks in as the I/O size exceeds 4KB, with up to 2x the performance of the Intel NIC in bidirectional transfers. The Intel adapter also exhibits large variability in the same test environment, which is symptomatic of performance corners. 4 This section is excerpted from Chelsio white paper Windows SMB 3.0 Performance at 40Gbps. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 6

Figure 6 RDMA and NIC Throughput Comparison for SMB 3.0 Figure 7 RDMA and NIC IOPS Comparison for SMB 3.0 The tests results therefore establish that SMB significantly benefits from an RDMA transport in increased performance and efficiency compared to regular NICs. These benefits are automatically and transparently enabled when using Windows Server 2012 R2, which minimizes the user s management and configuration burden. In addition to performance and efficiency gains, SMB over iwarp benefits from greatly improved data integrity protection, thanks to iwarp s end-to-end payload CRC (in lieu of simple checksums for the standard NIC). Furthermore, being especially designed for storage networking, T5 incorporates additional reliability features, including internal datapath CRC and ECC-protected memory. For these reasons, T5 adapters have been selected for Microsoft s Cloud Platform System, a scalable, turnkey private cloud solution. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 7

Lustre over iwarp RDMA at 40Gbps 5 Lustre is a scalable, secure and highly-available cluster file system that addresses extreme I/O needs, providing low latency and high throughput in large computing clusters. Lustre can use and benefit from RDMA, just like other storage protocols. This section compares the performance of Lustre RDMA over 40Gbps Ethernet and FDR InfiniBand, showing nearly identical performance. Unlike IB, iwarp provides a high-performance RDMA transport over standard Ethernet gear, with no special configuration needed or additional management costs. Thanks to its hardware TCP/IP foundation, it provides low latency and all the benefits of RDMA, with routability to scale to large clusters and long distances. The following graphs compare Lustre READ and WRITE throughput over iwarp RDMA and IB- FDR, at different I/O sizes using the fio tool. The READ throughput numbers show 40 Gbps iwarp delivering nearly identical performance with IB-FDR (56 Gbps) over the range of interest. Figure 8 READ Throughput vs. I/O size The READ throughput numbers show 40 Gbps iwarp RDMA delivering nearly identical performance with IB-FDR (56 Gbps) over the range of interest. The WRITE results confirm the equality between the two transports, with nearly the same performance despite the theoretical bandwidth advantage of IB (56G vs. 40G for one port). Figure 9 WRITE Throughput vs. I/O size 5 This section is excerpted from Chelsio white paper Lustre over iwarp RDMA at 40Gbps. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 8

These results show that Lustre over iwarp RDMA at 40Gbps provides highly competitive performance compared to IB-FDR, and a superior performance curve in relation to the I/O size. This result is just one more in a series of studies of the two fabrics that consistently affirm this conclusion; that iwarp is a no-compromise, drop-in Ethernet replacement for IB. NFS over RDMA at 40Gbps 6 NFS over RDMA is an exciting development for the trusted, time proven NFS protocol, with the promise of high-performance and efficiency brought in by the RDMA transport. The unprecedented price and adoption curve of 40Gbps Ethernet, along with the focus on high efficiency and high-performance in an era of big data and massive datacenters, are driving the interest in performance optimized transports, such as RDMA. RDMA is also particularly interesting when mated to high throughput, low latency SSDs, where it allows extracting the most performance out of these new devices. This section presents early performance results for NFS over 40Gbps iwarp RDMA, using Chelsio s Terminator 5 (T5) ASIC and InfiniBand-FDR. The following graphs compare NFS READ and NFS WRITE throughput and CPU idle percentage for iwarp RDMA and FDR InfiniBand at different I/O sizes using the iozone tool. The results show that iwarp RDMA at 40Gbps and IB FDR (56Gbps) provide virtually the same NFS performance, in throughput and CPU utilization, although the latter is theoretically capable of higher wire rate. In addition, thanks to its TCP/IP foundation, iwarp RDMA allows using standard Ethernet equipment, with no special configuration and without requiring a fabric overhaul or additional acquisition and management costs. This result is another that affirms this conclusion; shown in all previous studies of the two fabrics, which makes iwarp RDMA a no compromise, drop-in Ethernet replacement for IB. Figure 10 READ Throughput & CPU Idle % vs. I/O Size 6 This section is excerpted from Chelsio white paper NFS/RDMA over 40Gbps Ethernet. Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 9

Figure 11 WRITE Throughput & CPU Idle % vs. I/O Size FCoE at 40Gbps with FC-BB-6 7 This section reports FCoE performance results for Chelsio s Terminator 5 (T5) ASIC running at 40Gbps. T5 is the industry s first high-performance full FCoE offload solution with FC-BB-6 VNto-VN and SAN management software support. The results using a single FCoE target running over a Chelsio T580-CR Unified Wire Network adapter shows I/O numbers reaching nearly 4M per second, and line-rate throughput starting at I/O sizes as low as 2KB. The following graph plots the performance results, showing how linerate 40Gbps is achieved at I/O sizes as small as 2KB, with peak IOPS nearing 4M/sec for READ and 2.5M/sec for WRITE. Figure 12 Throughput and IOPS vs. I/O size 7 This section is excerpted from Chelsio white paper FCoE at 40Gbps with FC-BB-6 Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 10

Conclusion Benchmarking results show Chelsio T5 to be an ideal fit for high-performance storage networking solutions across the full range of storage protocols. Internet SCSI (iscsi) and iscsi over RDMA (iser) testing demonstrates that Chelsio T5 adapters provide consistently superior results, lower CPU utilization, higher throughput and drastically lower latency. Furthermore, thanks to using the routable and reliable TCP/IP as a foundation, T5 allows highly scalable and cost effective installations using regular Ethernet switches. In addition, due to its hardware TCP/IP foundation, T5 provides low latency and all the benefits of RDMA for high-performance deployments such as for Windows Server 2012 SMB Direct, NFS over RDMA and Luster over RDMA, with routability to scale to large clusters and long distances. As a result, iwarp can be considered a no-compromise, drop-in Ethernet replacement for IB. Related Links The Chelsio Terminator 5 ASIC T5 Offloaded iscsi with T10-DIX iser: Frequently Asked Questions SMBDirect Latency on Windows Server 2012 R2 iscsi Heritage and Future Copyright 2015. Chelsio Communications Inc. All Rights Reserved. 11