Accelerating High-Speed Networking with Intel I/O Acceleration Technology



Similar documents
Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Integrated Network Acceleration Features.

A Superior Hardware Platform for Server Virtualization

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

New Trends Make 10 Gigabit Ethernet the Data-Center Performance Choice

Accelerating Data Compression with Intel Multi-Core Processors

Intel Virtualization Technology (VT) in Converged Application Platforms

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

PCI Express* Ethernet Networking

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

10GBASE-T for Broad 10 Gigabit Adoption in the Data Center

Leading Virtualization 2.0

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Boosting Data Transfer with TCP Offload Engine Technology

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Broadcom 10GbE High-Performance Adapters for Dell PowerEdge 12th Generation Servers

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Intel I340 Ethernet Dual Port and Quad Port Server Adapters for System x Product Guide

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

PC Solutions That Mean Business

Using Multi-Port Intel Ethernet Server Adapters to Optimize Server Virtualization

3G Converged-NICs A Platform for Server I/O to Converged Networks

Building Enterprise-Class Storage Using 40GbE

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

How To Increase Network Performance With Segmentation

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Intel Platform and Big Data: Making big data work for you.

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Fiber Channel Over Ethernet (FCoE)

Intel Media SDK Library Distribution and Dispatching Process

Intel Desktop Board DG965RY

Comparing Multi-Core Processors for Server Virtualization

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

New!! - Higher performance for Windows and UNIX environments

Windows Server 2008 R2 Hyper-V Live Migration

Configuring RAID for Optimal Performance

Brocade Solution for EMC VSPEX Server Virtualization

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

OPTIMIZING SERVER VIRTUALIZATION

High Performance Computing and Big Data: The coming wave.

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Cloud based Holdfast Electronic Sports Game Platform

Simple, Reliable Performance for iscsi Connectivity

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Interoperability Test Results for Juniper Networks EX Series Ethernet Switches and NetApp Storage Systems

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Overcoming Security Challenges to Virtualize Internet-facing Applications

1-Gigabit TCP Offload Engine

Lustre Networking BY PETER J. BRAAM

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Scaling up to Production

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

New Features in SANsymphony -V10 Storage Virtualization Software

Deployments and Tests in an iscsi SAN

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Memory Sizing for Server Virtualization. White Paper Intel Information Technology Computer Manufacturing Server Virtualization

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Solution Recipe: Remote PC Management Made Simple with Intel vpro Technology and Intel Active Management Technology

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

PCI Express Overview. And, by the way, they need to do it in less time.

Getting More Performance and Efficiency in the Application Delivery Network

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Virtual Network Exceleration OCe14000 Ethernet Network Adapters

Intel DPDK Boosts Server Appliance Performance White Paper

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Taking Virtualization

Doubling the I/O Performance of VMware vsphere 4.1

Applying Multi-core and Virtualization to Industrial and Safety-Related Applications

Intel Desktop Board DQ965GF

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Quad-Core Intel Xeon Processor

Improving Grid Processing Efficiency through Compute-Data Confluence

Optimize Server Virtualization with QLogic s 10GbE Secure SR-IOV

Intel NetStructure Host Media Processing Software Release 1.0 for the Windows * Operating System

Microsoft Private Cloud Fast Track

High Availability Server Clustering Solutions

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Windows Server 2008 R2 Hyper-V Live Migration

Transcription:

White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing bandwidth requirements of enterprise IT. To take full advantage of this network capacity, data center managers must consider the impact of hightraffic volume on server resources. A compelling way of efficiently translating high bandwidth into increased throughput and enhanced quality of service is to take advantage of Intel I/O Acceleration Technology (Intel I/OAT), now available in the new Dual-Core and Quad-Core Intel Xeon processor-based server platforms. Intel I/OAT moves data more efficiently through these servers for fast, scalable, and reliable networking. Additionally, it provides network acceleration that scales seamlessly across multiple Ethernet ports, while providing a safe and flexible choice for IT managers due to its tight integration into popular operating systems.

White Paper Accelerating High-Speed Networking with Intel I/O Acceleration Technology Introduction Business success is becoming increasingly dependent on the rapid transfer, processing, compilation, and storage of data. To improve data transfers to and from applications, IT managers are investing in new networking and storage infrastructure to achieve higher performance. However, network I/O bottlenecks have emerged as the key IT challenge in realizing full value from server investments. Until recently, the real nature and extent of I/O bottlenecks was not thoroughly understood. Most network issues could be resolved with faster servers, higher bandwidth network interface cards (NICs), and various networking techniques such as network segmentation. However, the volume of network traffic begun to outpace server capacity to manage that data. This increase in network traffic is due to the confluence of a variety of market trends. These trends include: Convergence of data fabrics to Ethernet Server consolidation and virtualization, which is increasing server port densities and maximizing per-port bandwidth requirements Clusters and grids rather than large servers are becoming the leading way of creating high-performance computing resources Increased adoption of enterprise audio and video resources (VOIP, video conferencing) and real-time data acquisition are contributing to network congestion The increased reliance on networked data and networked compute resources results in a need to manage an ever-increasing data load. This increased load threatens to outpace server-processing capabilities, as demonstrated by the marginal gains from traditional performance enhancements (for example, increasing network bandwidth and adding servers). As a result, IT managers are now asking, After investing in a 10X improvement in network bandwidth, why aren t we seeing comparable improvements in application response time and reliability? To answer this question, Intel research and development teams evaluated the entire server architecture to identify the I/O bottlenecks and determine their nature and impact on network performance. Out of this investigation came a new technology called Intel I/O Acceleration Technology (Intel I/OAT). This white paper discusses the platform-wide I/O bottlenecks found through Intel research and how Intel I/OAT resolves those issues for accelerating high-speed networking. Increased adoption of network-attached storage, versus directattach storage, to meet enterprise back-up and recovery needs Deployment of high-density blade servers to accommodate enterprise-computing requirements 2

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Finding the Real I/O Bottlenecks Intel research and development teams found the real I/O bottlenecks when they examined the entire flow of a data packet as it is received, processed, and sent out by the server. Figure 1 illustrates this flow. The following numbered descriptions correspond to the circled numbers in the illustration: 1. A client sends a request in the form of TCP/IP data packets that the server receives through the network interface card (NIC). The data packet contains TCP header information that includes packet identification and routing information as well as the actual data payload relating to the client request. 2. The server processes the TCP/IP packets and routes the payload to the designated application. This processing includes protocol computations involving the TCP/IP stack, multiple server memory accesses for packet descriptors and payload moves, and various other system overhead activities (for example, interrupt handling and buffer management). 3. The application acknowledges the client request and recognizes that it needs data from storage to respond to the request. 4. The application accesses storage to obtain the necessary data to satisfy the client request. 5. Storage returns the requested data to the application. 6. The application completes processing of the client request using the additional data received from storage. 7. The server routes the response back through the network connection to be sent as TCP/IP packets to the client. The above packet data flow has remained largely unchanged for more than a decade. For the networking requirements of today, the result is significant system overhead, excessive memory accesses, and inefficient TCP/IP processing. Figure 2 represents the entire network I/O overhead picture. It is important to note, however, that the system overhead, TCP/IP processing, and memory access categories of overhead are not proportional. In fact, as discussed later, the amount of CPU usage for each category varies according to application I/O packet size. Server Application 4 3 2 5 6 7 Network Data Stream 1 Client System Overhead Interrupt handling Buffer management Operating system transitions TCP/IP Processing TCP stack code processing Memory Access Packet and data moves CPU stall Storage Source: Intel research on the Linux* operating system Figure 1. Data paths to and from application. Server overhead and response latency occurs throughout the request-response data path. These overheads and latencies include processing incoming request TCP/IP packets, routing packet payload data to the application, fetching stored information, and reprocessing responses into TCP/IP packets for routing back to the requesting client. Figure 2. Network I/O processing tasks. Server network I/O processing tasks fall into three major overhead categories, each varying as a percentage of total overhead according to TCP/IP packet size. 3

White Paper Accelerating High-Speed Networking with Intel I/O Acceleration Technology It is also important to note that some technologies already exist for mitigating some CPU overhead issues. For example, all Intel PRO Server Adapters and Intel PRO Network Connections (LAN on motherboard or LOM) include advanced features designed to reduce CPU usage. These features include: Interrupt moderation TCP checksum offload TCP segmentation Large send offload Their implementation does not impair compatibility with other network components or require special management or modification of operating systems or applications. Other approaches exist that claim to address system overhead by offloading even more processing to the NIC. These approaches include the TCP offload engine (TOE) and remote direct memory access (RDMA). TOE uses a specialized and dedicated processor on the NIC to handle some of the packet protocol processing. It does not fully address the other performance bottlenecks shown in Figure 2. In fact, for small packet sizes and short-duration connections, TOE may be of very limited benefit in addressing the overall I/O performance problem. Additionally, given its offloading (that is, copying) of the network stack to a fixed-speed microcontroller (the offload engine), not only is performance limited to the speed of the micro-controller, but there is also a risk that key network capabilities, such as adapter teaming and failover, may not function in a TOE environment. As for RDMA-enabled NICs (RNICs), the RDMA protocol supports direct placement of packet payload data into an application s memory space. This addresses the memory access bottleneck category by reducing data movement overhead for the CPU. However, the RDMA protocol requires significant changes to the network stack and can require changes to application software that utilizes the network. Defining the Worst I/O Bottleneck The limitations of existing I/O acceleration solutions became even clearer as Intel research and development teams began to quantify each category under different conditions. Figure 3 summarizes these results for various application I/O sizes and their impact by task category on percent of CPU utilization. Notice in Figure 3 that CPU usage by TCP/IP processing is nearly constant across I/O sizes ranging from 2KB to 64KB. Although TCP/IP processing is a significant bottleneck, it is not the most significant bottleneck. Memory accesses account for more CPU usage in all cases than TCP/IP processing, and system overhead is the worst I/O bottleneck for application I/O sizes below 8KB. As stated earlier and indicated by the data in Figure 3, TOE and RDMA do not address the entire I/O bottleneck issue. What is needed is a system-wide solution that can fit anywhere in the enterprise computing hierarchy without requiring modification of application software and which provides acceleration benefits for all three network bottlenecks. Intel I/OAT, now available on the new Dual-Core and Quad-Core Intel Xeon processor-based platforms, is exactly that kind of solution. 2KB 4KB 8KB 16KB 32KB 64KB Figure 3. CPU utilization varies according to I/O size. TCP/IP processing is fairly constant and tends to be a smaller part of CPU utilization compared to system overhead. 4

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/OAT The System-Wide Solution Intel I/OAT address all three server I/O bottlenecks (illustrated in Figures 2 and 3) by providing fast, scalable, and reliable networking. In addition, it provides network acceleration that scales seamlessly across multiple Ethernet ports, and it is a safe and flexible choice for IT managers due to its tight integration into popular operating systems. The system-wide network I/O acceleration technologies applied by Intel I/OAT are shown in Figure 4 and include: Parallel Processing of TCP and Memory Functions. Lowers system overhead and improves the efficiency of TCP stack processing by using the abilities of the CPU to execute multiple instructions per clock, pre-fetch TCP/IP header information into cache, and perform other data movement operations in parallel. Affinitized Data Flows. Partitions the Network Stack Processing dynamically across multiple physical or logical CPUs. This allows CPU cycles to be allocated to the application for faster execution. Asynchronous Low-Cost Copy. Intel Quick Data Technology provides enhanced data movement, allowing payload data copies from the NIC buffer in system memory to the application buffer with far fewer CPU cycles, returning the saved CPU cycles to productive application workloads. Improved TCP/IP Processing with Optimized TCP/IP Stack. Implements separate packet data and control paths to optimize processing of the packet header from the packet payload. This and other stack-related enhancements reduce protocol processing cycles. Because Intel I/OAT enhances performance while keeping all processing of the operating system s TCP stack on the Intel Xeon processor and the state information for TCP connections within the server s system memory, the technology is said to be stateless. This is as opposed to stateful offload technologies, such as TOE. As a stateless technology, Intel I/OAT retains use of the system processors and protocols as the principal engines for handling network traffic. Additionally, Intel I/OAT is used throughout the platform to increase CPU efficiency by reducing bottlenecks across most application I/O sizes. Because Intel I/OAT is tightly integrated into popular operating systems, it ensures full compatibility with critical network configurations such as adapter teaming and link aggregation. As a result, Intel I/OAT provides a fast, scalable, and reliable network acceleration solution with significant performance advantages over prior system implementations and technologies. Server with Intel I/O Acceleration Technology Optimized TCP/IP stack Affinitized network data flow for balanced computing across multiple CPUs Enhanced direct memory access (DMA) for more efficient memory copies Network Data Stream Figure 4. Intel I/OAT performance enhancements. Intel I/OAT implements server-wide performance enhancements in all three major server components to ensure that data gets to and from applications consistently faster and with greater reliability. 5

White Paper Accelerating High-Speed Networking with Intel I/O Acceleration Technology The Performance Advantage of Intel I/OAT The Intel I/OAT platform-level approach to improving network performance has been verified by extensive testing. Some of these results are summarized in Figure 5. The Intel I/OAT performance tests were conducted for both Linux* and Microsoft Windows* operating systems using two Dual-Core and Quad-Core Intel Xeon processor-based servers tested across multiple Gigabit Ethernet (GbE) NIC ports (two to eight GbE ports) as represented by the X-axis. One of the servers was an Intel E7520-based platform without the benefit of Intel I/OAT. The other was an Intel E5000 series server using the new Dual-Core Intel Xeon processor with Intel I/OAT enabled. In the test examples of Figure 5, the graphs represent both CPU utilization percentages (the lines) and the corresponding network throughput performance (the vertical bars). Both systems underwent identical stress tests. The left graph in Figure 5 summarizes the results of an Intel Xeon processor-based server running a Linux operating system. Notice that an Intel I/OAT-enabled platform, running Linux and using eight GbE ports, achieved a CPU utilization improvement of over 40 percent versus a non-intel I/OAT-enabled platform. Additionally, this same platform achieved almost twice the network throughput as the non-intel I/OAT-enabled platform operating under the same conditions. Similarly, with Microsoft Windows Server 2003* (right graph in Figure 5), the network throughput of the Intel I/OAT-enabled platform nearly doubled for eight GbE ports. In this test, the Intel E7520-based platform was incapable of generating CPU utilization data beyond four ports because, without the benefit of receive-side scaling, the server directs all network traffic to Processor 0, saturating the processor and limiting the system s ability to report data. However, the new Dual-Core and Quad-Core Intel Xeon processorbased platform with Intel I/OAT balanced the workload across the processors and never reached 70-percent CPU utilization, even at eight GbE ports. Intel I/O Acceleration Technology Performance Comparison for Linux* Bidirectional Throughout and CPU % Utilization Intel I/OAT Performance Comparison for Microsoft Windows Server 2003* Bidirectional Throughput and CPU % Utilization Throughput (Megabits Per Second) 12000 10000 8000 6000 4000 2000 0 24 7 15 99 99 99 31 56 100 90 80 70 60 50 40 30 20 10 0 CPU Utilization (%) Throughput (Megabits Per Second) 14000 12000 10000 8000 6000 4000 2000 0 11 24 51 70 100 90 80 70 60 50 40 30 20 10 0 CPU Utilization (%) Number of Gigabit Ethernet Ports (6 Clients per Port) Number of Gigabit Ethernet Ports (6 Clients per Port) Intel E7520-based platform New Dual-Core Intel Xeon processor-based platform Intel E7520-based platform New Dual-Core Intel Xeon processor-based platform CPU % Utilization CPU % Utilization CPU % Utilization CPU % Utilization Throughput Throughput Throughput Throughput Figure 5. Network-performance comparisons for platforms with and without Intel I/OAT. Compared to previous processors, the new Dual-Core Intel Xeon processor with Intel I/OAT provides superior performance in terms of both higher throughput and reduced percentage of CPU utilization. 6

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Conclusion In summary, Intel I/OAT improves network application responsiveness by moving network data more efficiently through Dual-Core and Quad-Core Intel Xeon processor-based servers to provide fast, scalable, and reliable network acceleration for the majority of today s data center environments. This translates to the following significant areas of benefit for IT data centers: Fast. A primary benefit of Intel I/OAT is that it significantly reduces CPU overhead, freeing resources for more critical compute tasks. Intel I/OAT uses server processors more efficiently by leveraging architectural improvements within the CPU, chipset (Intel Quick Data Technology), network controller, and firmware to minimize performance-limiting bottlenecks. Scalable. Intel I/OAT scales seamlessly across multiple GbE ports (up to eight ports), and can scale up to 10GbE, while maintaining power and thermal characteristics similar to those of a standard gigabit network adapter. Reliable. Intel I/OAT is a safe and flexible choice because it is tightly integrated into popular operating systems such as Microsoft Windows Server 2003 and Linux, avoiding the support risks associated with relying on third-party hardware vendors for network stack updates. Intel I/OAT also preserves critical network capabilities, such as teaming and failover, by maintaining control of the network stack processing within the CPU where it belongs. This results in reduced support risks for IT departments. For More Information To find out more about Intel I/O Acceleration Technology, visit www.intel.com/go/ioat. 7

Intel I/O Acceleration Technology (Intel I/OAT) requires an operating system that supports Intel I/OAT. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL MAY MAKE CHANGES TO SPECIFICATIONS, PRODUCT DESCRIPTIONS, AND PLANS AT ANY TIME, WITHOUT NOTICE. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel I/O Acceleration Technology may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available upon request. Copyright 2006 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel. Leap ahead., Intel. Leap ahead. logo, and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Printed in USA 1206/TR/OCG/XX/PDF Please Recycle 316125-001US