1-Gigabit TCP Offload Engine



Similar documents
Boosting Data Transfer with TCP Offload Engine Technology

Doubling the I/O Performance of VMware vsphere 4.1

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

3G Converged-NICs A Platform for Server I/O to Converged Networks

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers

White Paper. Advanced Server Network Virtualization (NV) Acceleration for VXLAN

iscsi Boot Functionality: Diskless Booting Enables SANlike Benefits at Lower Costs

Demartek June Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Windows Server 2008 R2 Hyper-V Live Migration

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Solving the Hypervisor Network I/O Bottleneck Solarflare Virtualization Acceleration

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Windows Server 2008 R2 Hyper-V Live Migration

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Meeting the Five Key Needs of Next-Generation Cloud Computing Networks with 10 GbE

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Frequently Asked Questions k. Third-party information brought to you courtesy of Dell. NIC Partitioning (NPAR) FAQs

White Paper Solarflare High-Performance Computing (HPC) Applications

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Romley/Sandy Bridge Server I/O Solutions By Seamus Crehan Crehan Research, Inc. March 2012

Managing Data Center Power and Cooling

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

VMWARE WHITE PAPER 1

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Dell Reference Configuration for Hortonworks Data Platform

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Software User Manual NX1_NX2_RSS. NetXtreme/NetXtreme II RSS Indirection Table Configuration Guide on Microsoft Windows Server 2012

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle

Microsoft Exchange Server 2003 Deployment Considerations

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

HP ProLiant BL685c takes #1 Windows performance on Siebel CRM Release 8.0 Benchmark Industry Applications

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Linux NIC and iscsi Performance over 40GbE

Broadcom NetXtreme 10GBase-T Network Adapter Family for System x IBM Redbooks Product Guide

Fibre Channel Over and Under

10GBASE-T for Broad 10 Gigabit Adoption in the Data Center

Kronos Workforce Central on VMware Virtual Infrastructure

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

VXLAN Performance Evaluation on VMware vsphere 5.1

Virtualization with the Intel Xeon Processor 5500 Series: A Proof of Concept

Summary. Key results at a glance:

EMC Unified Storage for Microsoft SQL Server 2008

Deployments and Tests in an iscsi SAN

HP ProLiant DL380 G5 takes #1 2P performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Windows

SN A. Reference Guide Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP

10GBASE-T for Broad 10 Gigabit Adoption in the Data Center

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

An Oracle Technical White Paper November Oracle Solaris 11 Network Virtualization and Network Resource Management

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

IxChariot Virtualization Performance Test Plan

New!! - Higher performance for Windows and UNIX environments

Whitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

Improve Power saving and efficiency in virtualized environment of datacenter by right choice of memory. Whitepaper

Brocade Solution for EMC VSPEX Server Virtualization

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Dell Desktop Virtualization Solutions Simplified. All-in-one VDI appliance creates a new level of simplicity for desktop virtualization

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Cut I/O Power and Cost while Boosting Blade Server Performance

White Paper Open-E NAS Enterprise and Microsoft Windows Storage Server 2003 competitive performance comparison

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

How To Evaluate Netapp Ethernet Storage System For A Test Drive

Comparing the Network Performance of Windows File Sharing Environments

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

White Paper. Scalability Results. Select the hardware configuration that s right for your organization to optimize performance

Understanding Date Center. Power Usage Effectiveness

Maximum performance, minimal risk for data warehousing

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure

Introduction to MPIO, MCS, Trunking, and LACP

Solve your IT energy crisis WITH An energy SMArT SoluTIon FroM Dell

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

Accelerate SQL Server 2014 AlwaysOn Availability Groups with Seagate. Nytro Flash Accelerator Cards

Intel DPDK Boosts Server Appliance Performance White Paper

Dell Desktop Virtualization Solutions Stack with Teradici APEX 2800 server offload card

White Paper. Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning

Accelerating Microsoft Exchange Servers with I/O Caching

Gigabit Ethernet Design

Cisco Unified Communications on the Cisco Unified Computing System

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Transcription:

White Paper 1-Gigabit TCP Offload Engine Achieving greater data center efficiencies by providing Green conscious and cost-effective reductions in power consumption. June 2009

Background Broadcom is a recognized technology leader in providing networking solutions that have consistently benefited end customers in various enterprise networking and storage environments over the years. Broadcom is a participating member in the Green Grid industry consortium, an organization committed to developing and promoting energy efficiency for data centers and information service delivery. Introduction The Converged Network Interface Controller (C-NIC) concept was introduced in the enterprise market in 2006 when Broadcom demonstrated the convergence of network, storage, and management traffic on a single piece of hardware. BCM5708-based LAN on Motherboard (LOM) solutions were introduced shortly afterwards and showed immediate benefits using one of the C-NIC components: the TCP/IP offload engine (TOE) with reduced CPU utilization and increased throughput in a Microsoft Windows server environment. The associated software suite has remained the same for end users since its inception, and has continued to provide the benefits of TOE at speeds such as 1 Gb and 10 Gb without interruption to deployed systems. The 1 Gb TOE has been further refined with hardware and software optimization on the nextgeneration gigabit controller BCM5709, such as the IPv6 TOE. TOE Overview TOE Architecture TCP Chimney Offload is a networking technology that helps transfer the workload from the CPU to a network adapter during network data transfers. The TCP Chimney Offload creates a direct connection between the top of the TCP/IP protocol stack and the software drivers to enable offloading of the protocol stack while performing processing on the controller. To improve data-transfer performance over IP networks, the TOE model can relieve much of the overhead of processing TCP/IP from the host CPU. TOE allows the operating system to offload all TCP/IP traffic to specialized hardware on the network adapter while leaving TCP/IP control decisions to the host server. Page 2

By relieving the host processor bottleneck, TOE can help deliver the performance benefits that administrators expect from applications running across high-speed network links. TOE is also cost-effective, as it processes the TCP/IP protocol stack on a high speed network device that requires less processing power than a general-purpose high performance CPU (see Figure 1). Application Switch Top protocol of Chimney State update interface Data transfer interface Intermediate protocol layers Chimney TOE NIC Offload target Figure 1: TOE Architecture Broadcom TOE technology has been covered in detail in various white papers and OEM publications 1 2. 1. http://www.dell.com/downloads/global/power/ps3q06-20060132-broadcom.pdf 2. http://download.microsoft.com/download/3/5/0/3508460c-7bc2-4f7f-b5b9-4aefb981689b/c-nic-wp102-r.pdf Page 3

System Benefits of TOE This white paper discusses the system benefits of lower power consumption delivered by using TOE for green data center initiatives. These efficiencies further translate into substantial data center savings for enterprise models such as the consolidation of workloads in virtualization environments. In recognition of the escalating need to add more LOM controllers due to increased input/output (I/O) activity on the servers, we analyzed the relative power consumption benefits of TOE and compared these to Layer 2 teaming solutions. This analysis produced encouraging results, which are explained in the paper. To achieve these objectives, Broadcom performance labs used the test setup shown in Figure 2. This included the NetIQ Chariot benchmark. Chariot evaluates the performance of networked applications, performs stress tests on network devices, and predicts networked application performance before deployment. Clients 16 Chariot clients sending and receiving from/to the system under test Standard 16-client 1-Gigabit NICs Standard Gigabit Switch System Under Test BCM5709C-based NICs with four ports and TOE enabled 14 Standard OEM servers with Intel Xeon 3.4 GHz 3.5 GB Memory, Microsoft Windows 2003 Enterprise Edition, SP1 Two standard OEM servers with 1 GB Memory, Microsoft Windows 2003 Enterprise Edition Standard OEM server with two 2.93 GHz Intel Nehalem 4-Core CPUs with Hyperthreading Disabled (for example, 1 thread per core) 7.95 GB Memory Microsoft Windows 2003 Enterprise Edition x64 SP2 Figure 2: Test Setup The objective of this test setup was to measure power consumption, CPU utilization, and throughput of systems enabled with TOE. An identical configuration was used with TOE disabled in the Layer 2-only mode to provide a complete comparative analysis. This setup included two BCM5709 gigabit devices based on dual port NICs in a system under test, with TOE enabled. 16 Windows clients were set up on 16 systems, using single-port gigabit controllers. For current and future data center environments, a majority of server configurations have aggressively moved to 4-gigabit per server port configurations. Therefore, the SUT is based on four GE ports. Chariot clients were set up on each system to send and receive the network traffic to/from the system under test. The setup was created to analyze the maximum benefits that the end user can expect from 100 percent network utilization. Test setup details, including Chariot script details, can be shared with interested parties so that they can create their own test environment and duplicate Broadcom performance lab results. Page 4

Power Benefits Reduced power consumption is measured as the TOE performs TCP/IP network traffic processing on the BCM5709, rather than passing this task to the host CPU, a process that consumes more power for the same duty. Systems with TOE-enabled networking experience an overall lower power consumption. Figure 3 shows a throughput and power consumption comparison when TOE is enabled and disabled on the NICs in the system under test, in both teaming and non-teaming modes. Generally, the TOE for 1-gigabit solutions saves overall system power by about 3 8W, while lowering CPU utilization and maintaining the throughput for larger I/O sizes. 100% 270 RX TPUT BCM5709 L2 80% 260 RX TPUT BCM5709 TOE % Line Rate = 4 Gbps 60% 40% 20% 250 240 Watts RX PWR BCM5709 L2 0% 1K 2K 4K 8K 16K 32k 64k 230 RX PWR BCM5709 TOE I/O Size (Bytes) Figure 3: Chariot Four-Port GbE Throughput vs. Platform Power in Non-Teaming Mode Measuring 100 percent network utilization for each server over a one year period and using the current power usage effectiveness (PUE) 1 2 value of 1.7 results in a savings of $10.00 per system. We calculated the savings by multiplying an 8W single-system power reduction by 8.8 cents/kwh, the average electricity commercial use rate reported in the footnoted 2 energy star report, section 2.2.4. 1. http://svlg.net/campaigns/datacenter/docs/dcefr_executive_summary.pdf (Page 8) 2. http://www.energystar.gov/ia/partners/prod_development/downloads/epa_datacenter_report_congress_final1.pdf Page 5

The savings may, in fact, have increased over the last three years if the commercial rate had trended upwards. When we extrapolate these findings over to a server farm of various capacities as shown in Figure 4, the resulting savings can grow exponentially for a cost-based and "green" data center-focused IT yearly budget. For example, a 2500 server deployment could generate an annual cost savings of $26,000. Cost Savings $70,000 $60,000 $50,000 $40,000 $30,000 $20,000 $10,000 $0 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Servers deployed in data center Figure 4: Cost Savings with Lower System Power Consumption CPU Utilization Benefits Figure 5 shows the throughput and CPU utilization benefits when TOE is enabled and disabled on the NICs in the system under test, in the non-teaming mode. 100% 50 RX TPUT BCM5709 L2 80% 40 % Line Rate = 4 Gbps 60% 40% 20% 30 CPU% 20 10 RX TPUT BCM5709 TOE RX CPU% BCM5709 L2 RX CPU% BCM5709 TOE 0% 1K 2K 4K 8K 16K 32k 64k I/O Size (Bytes) 0 Figure 5: Chariot Four-Port GbE Throughput vs. CPU Utilization in Non-Teaming Mode Page 6

The ratio of throughput to CPU utilization is expressed using the performance efficiency (PE) index, which was originally developed by PC Week (now eweek) in 1995. The PE index is still the most commonly used performance ratio for evaluating adapters. High PE indexes, indicating high throughput with low CPU utilization, suggest favorable overall system performance. Lower CPU utilization and increased PE ratios were illustrated using the same setup shown in Figure 2 on page 4. CPU utilization for larger I/Os is lower by an average of 9 percent in this configuration. Availability of such CPU computing power in some scenarios can result in substantial infrastructure cost savings for a green data center as shown in Figure 6. The savings shown in Figure 6 on page 8 were calculated using the values shown in Table 1. This table quantifies servers, switches, and data center resource requirements, and shows ratios and associated costs. These factors represent general estimates which vary according to locations, solution configurations, and requirements. Broadcom performance labs modified the various parameters to assess the impact on the end results. The customer cost savings have remained high and justify a stronger focus on using TOE in existing and future Windows server deployments. Table 1: CPU Utilization Savings Description Metric Application Percentage of CPU utilization savings 9.00 For large I/O sizes (2K - 64K) Cores/server 8 Average server cost $5,000 Server lifetime 3 Switch: server ratio 2.5 Switch cost $1,500 Servers per rack 12 2U servers Rack floor space (sq. ft.) 10 Commercial sq. ft. cost $1,000.00 Servers/IT Administrator 250 IT Administrator annual salary $60,000 Page 7

CPU Utilization Savings $2,000,000 $1,800,000 $1,600,000 $1,400,000 Total savings $1,200,000 $1,000,000 $800,000 $600,000 $400,000 Square footage savings Manpower savings Cost savings per switch Annual cost savings for server purchase $200,000 $0 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Number of servers deployed Figure 6: Total Savings for an Average 9 Percent CPU Utilization Reduction Environments where workloads can be divided into distinct modules, and then distributed over several systems, can benefit substantially. Workload consolidations from one system to another in a virtualized environment will also benefit, as the available CPU resources can be assigned to other processes and activities with greater efficiency. Page 8

TOE Teaming Power Consumption and CPU Utilization Benefits This test setup was used to create a single logical 4-gigabit networking pipe using a Broadcom Advanced Control Suite BACS in L2 and TOE modes. The objective of these tests was to analyze any differences comparing TOE teaming against a standard non-toe teaming configuration. TOE-teamed adapters have greater performance efficiency compared to L2-teamed adapters. TOE teaming performance is virtually identical to TOE performance when multiple adapters are run independently. Teaming "overhead" with TOE enabled is negligible. This serves to keep the realized system power benefits and CPU utilization intact. Summary The TOE power consumption and CPU utilization analysis presented herein demonstrates the benefits of TOE deployment into customer data centers. This implementation advances an effective green data center strategy in a corporate climate where businesses are doing more with less, and provides measurable initiatives to contain overall IT budgets. In recent years, Broadcom has provided these C-NIC solutions using 1 Gb speed TOE. This approach delivers tangible advantages to end customers who need results when managing dense networking traffic environments. As 10 GbE becomes more ubiquitous in the data center, we anticipate that the TOE success story will further extend beyond 1-gigabit links to the more expansive and demanding 10-gigabit links. Engineering Profiles Rich Hernandez (rich_hernandez@dell.com) is a senior development engineer with the Server Product Group at Dell TM. Rich has been in the computer and data networking industry for over 25 years, and has a Bachelor of Science degree in Electrical Engineering (BSEE) from the University of Houston. He has pursued postgraduate studies at Colorado Technical University. Dhiraj Sehgal (dhiraj@broadcom.com) is a Senior Product Line Manager in the Enterprise Networking Group at Broadcom Corporation. He has extensive datacenter, enterprise server, and storage experience focusing on various I/O subsystems. Dhiraj has a Master of Science degree in Electrical Engineering (MSEE) from North Carolina State University, Raleigh. Page 9

Broadcom, the pulse logo, Connecting everything, and the Connecting everything logo are among the trademarks of Broadcom Corporation and/or its affiliates in the United States, certain other countries and/ or the EU. Any other trademarks or trade names mentioned are the property of their respective owners. BROADCOM CORPORATION 5300 California Avenue Irvine, CA 92617 2009 by BROADCOM CORPORATION. All rights reserved. Phone: 949-926-5000 Fax: 949-926-5203 E-mail: info@broadcom.com Web: www.broadcom.com 5709-WP101-R June 2009