Networking Driver Performance and Measurement - e1000 A Case Study
|
|
|
- Avice Kelly
- 9 years ago
- Views:
Transcription
1 Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg Intel Corporation [email protected] Mitch Williams Intel Corporation [email protected] Abstract Networking performance is a popular topic in Linux and is becoming more critical for achieving good overall system performance. This paper takes a look at what was done in the e1000 driver to improve performance by (a) increasing throughput and (b) reducing of CPU utilization. A lot of work has gone into the e1000 Ethernet driver as well into the PRO/1000 Gigabit Ethernet hardware in regard to both of these performance attributes. This paper covers the major things that were done to both the driver and to the hardware to improve many of the aspects of Ethernet network performance. The paper covers performance improvements due to the contribution from the Linux community and from the Intel group responsible for both the driver and hardware. The paper describes optimizations to improve small packet performance for applications like packet routers, VoIP, etc. and those for standard and jumbo packets and how those modifications differs from the small packet optimizations. A discussion on the tools and utilities used to measure performance and ideas for other tools that could help to measure performance are presented. Some of the ideas may require help from the community for refinement and implementation. Introduction This paper will recount the history of e1000 Ethernet device driver regarding performance. The e1000 driver has a long history which includes numerous performance enhancements which occurred over the years. It also shows how the Linux community has been involved with trying to enhance the drivers performance. The notable ones will be called out along with when new hardware features became available. The paper will also point out where more work is needed in regard to performance testing. There are lots of views on how to measure network performance. For various reasons we have had to use an expensive, closed source test tool to measure the network performance for the driver. We would like to engage with the Linux community to try to address this and come up with a strategy of having an open source measurement tool along with consistant testing methods. 133
2 134 Networking Driver Performance and Measurement - e1000 A Case Study This paper also identifies issues with the system and the stack that hinder performance. The performance data also indicates that there is room for improvement. A brief history of the e1000 driver The first generation of the Intel R PRO/1000 controllers demonstrated the limitation of the 32-bit 33MHz PCI bus. The controllers were able to saturate the bus causing slow response times for other devices in the system (like slow video updates). To work with this PCI bus bandwidth limitation, the driver team worked on identifying and eliminating inefficiencies. One of the first improvements we made was to try to reduce the number of DMA transactions across the PCI bus. This was done using some creative buffer coalescing of smaller fragments into larger ones. In some cases this was a dramatic change in the behavior of the controller on the system. This of course was a long time ago and the systems, both hardware and OS have changed considerably since then. The next generation of the controller was a 64- bit 66MHz controller which definitely helped the overall performance. The throughput increased and the CPU utilization decreased just due to the bus restrictions being lifted. This was also when new offload features were being introduced into the OS. It was the first time that interrupt moderation was implemented. This implementation was fairly crude, based on a timer mechanism with a hard time-out time set, but it did work in different cases to decrease CPU utilization. Then a number of different features like descriptor alignment to cache lines, a dynamic inter-frame gap mechanism and jumbo frames were introduced. The use of jumbo frames really helped transferring large amounts of data but did nothing to help normal or small sized frames. It also was a feature which required network infrastructure changes to be able to use, e.g. changes to switches and routers to support jumbo frames. Jumbo frames also required the system stacks to change. This took some time to get the issues all worked out but they do work well for certain environments. When used in single subnet LANs or clusters, jumbo frames work well. Next came the more interesting offload of checksumming for TCP and IP. The IP offload didn t help much as it is only a checksum across twenty bytes of IP header. However, the TCP checksum offload really did show some performance increases and is widely used today. This came with little change to the stack to support it. The stack interface was designed with the flexibility for a feature like this. Kudos to the developers that worked on the stack back then. NAPI was introduced by Jamal Hadi, Robert Olsson, et al at this time. The e1000 driver was one of the first drivers to support NAPI. It is still used as an example of how a driver should support NAPI. At first the development team was unconvinced that NAPI would give us much of a benefit in the general test case. The performance benefits were only expected for some edge case situations. As NAPI and our driver matured however, NAPI has shown to be a great performance booster in almost all cases. This will be shown in the performance data presented later in this paper. Some of the last features to be added were TCP Segment Offload (TSO) and UDP fragment checksums. TSO took work from the stack maintainers as well as the e1000 development team to get implemented. This work continues as all the issues around using this have not yet been resolved. There was even a rewrite of the implementation which is currently under test (Dave Miller s TSO rewrite). The UDP
3 2005 Linux Symposium 135 fragment checksum feature is another that required no change in the stack. It is however little used due to the lack of use of UDP checksumming. The use of PCI Express has also helped to reduce the bottleneck seen with the PCI bus. The significantly larger data bandwidth of PCIe helps overcome limitations due to latencies/overheads compared to PCI/PCI-X buses. This will continue to get better as devices support more lanes on the PCI Express bus further reducing bandwidth bottlenecks. There is a new initiative called Intel R I/O Acceleration Technology (I/OAT) which achieves the benefits of TCP Offload Engines (TOE) without any of the associated disadvantages. Analysis of where the packet processing cycles are spent was performed and features designed to help accelerate the packet processing. These features will be showing up over the next six to nine months. The features include Receive Side Scaling (RSS), Packet Split and Chipset DMA. Please see the [Leech/Grover] paper Accelerating Network Receive Processing: Intel R I/O Acceleration Technolgy presented here at the symposium. RSS is a feature which identifies TCP flows and passes this information to the driver via a hash value. This allows packets associated with a particular flow to be placed onto a certain queue for processing. The feature also includes multiple receive queues which are used to distribute the packet processing onto multiple CPUs. The packet split feature splits the protocol header in a packet from the payload data and places each into different buffers. This allows for the payload data buffers to be page-aligned and for the protocol headers to be placed into small buffers which can easily be cached to prevent cache thrash. All of these features are designed to reduce or eliminate the need for TOE. The main reason for this is that all of the I/OAT features will scale with processors and chipset technologies. Performance As stated above the definition of performance varies depending on the user. There are a lot of different ways and methods to test and measure network driver performance. There are basically two elements of performance that need to be looked at, throughput and CPU utilization. Also, in the case of small packet performance, where packet latency is important, the packet rate measured in packets per second is used as a third type of measurement. Throughput does a poor job of quantifying performance in this case. One of the problems that exists regarding performance measurements is which tools should be used to measure the performance. Since there is no consistent open source tool, we use a closed source expensive tool. This is mostly a demand from our customers who want to be able to measure and compare the performance of the Intel hardware against other vendors on different Operating Systems. This tool, IxChariot by IXIA 1, is used for this reason. It does a good job of measuring throughput with lots of different types of traffic and loads but still does not do a good job of measuring CPU utilization. It also has the advantage that there are endpoints for a lot of different OSes. This gives you the ability to compare performance of different OSes using the same system and hardware. It would be nice to have and Open Source tool which could do the same thing. This is discussed in Section, Where do we go from here. There is an open source tool which can be used to test small packet performance. The tool is the packet generator or pktgen and is a kernel module which is part of the Linux kernel. The tool is very useful for sending lots of packets with set timings. It is the tool of choice for 1 Other brands and names may be claimed as the property of others.
4 136 Networking Driver Performance and Measurement - e1000 A Case Study anyone testing routing performance and routing configurations. All of the data for this section was collected using Chariot on the same platform to reduce the number of variables to control except as noted. The test platform specifications are: Blade Server Dual 2.8GHz Pentium R 4 Xeon TM CPUs, 512KB cache 1GB RAM Hyperthreading disabled Intel R 80546EB LAN-on-motherboard (PCI-X bus) Competition Network Interface Card in a PCI-X slot The client platform specifications are: Dell 2 PowerEdge R 1550/1266 Dual 1266MHz Pentium R III CPUs, 512KB cache, 1GB RAM, Red Hat 2 Enterprise Linux 3 with smp kernel, Intel R PRO/1000 adapters Comparison of Different Driver Versions The driver performance is compared for a number of different e1000 driver versions on the same OS version and the same hardware. The difference in performance seen in Figure 1 was due to the NAPI bug that Linux community found. It turns out that the bug was there for a 2 Other brands and names may be claimed as the property of others. long time and nobody noticed it. The bug was causing the driver to exit NAPI mode back into interrupt mode fairly often instead of staying in NAPI mode. Once corrected the number of interrupts taken was greatly reduced as it should be when using NAPI. Comparison of Different Frames Sizes verses the Competition Frame size has a lot to do with performance. Figure 2 shows the performance based on frame size against the competition. As the chart shows, frame size has a lot to do with the total throughput that can be reached as well as the needed CPU utilization. The frame sizes used were normal 1500 byte frames, 256 bytes frames and 9Kbyte jumbo frames. NOTE: The competition could not accept a 256 byte MTU so 512 bytes were used for performance numbers for small packets. Comparison of OS Versions Figure 3 shows the performance comparison between OS versions including some different options for a specific kernel version. There was no reason why that version was picked other than it was the latest at the time of the tests. As can be seen from the chart in Figure 3, the 2.4 kernels performed better overall for pure throughput. This means that there is more improvement to be had with the 2.6 kernel for network performance. There is already new work on the TSO code within the stack which may have improved the performance already as the TSO code has been known to hurt the overall network throughput. The 2.4 kernels do not have TSO which could be accounting for at least some of the performance differences.
5 2005 Linux Symposium 137 Driver Version Performance Throughput (Mbps) CPU Usage (%) Throughput (Mbps) CPU Utilization Driver Version 91 Figure 1: Different Driver Version Comparison Frame Size Performance Total Throughput (Mbps) CPU Usage (%) e1000 Throughput Comp. Throughput e1000 CPU Usage Comp. CPU Usage / Frame Size (bytes) 0 Figure 2: Frame Size Performance Against the Competition Results of Tuning NAPI Parameters Initial testing showed that with default NAPI settings, many packets were being dropped on receive due to lack of buffers. It also showed that TSO was being used only rarely (TSO was not being used by the stack to transmit). It was also discovered that reducing the driver s weight setting from the default of 64 would eliminate the problem of dropped packets. Further reduction of the weight value, even to very small values, would continue to increase throughput. This is shown in Figure 4. The explanation for these dropped packets is simple, because the weight is smaller, the driver iterates through its packet receive loop (in e1000_clean_rx_irq) fewer times, and hence writes the Receive Descriptor Tail regis-
6 138 Networking Driver Performance and Measurement - e1000 A Case Study OS Version Performance Total Throughput (Mbps) CPU Usage (%) Throughput (Mbps) CPU Utilization no-napi up up-nonapi OS Version no-napi UP 82 Figure 3: OS Version Performance ter more often. This notifies the hardware that descriptors are available more often and eliminates the dropped packets. It would be obvious to conclude that the increase in throughput can be explained by the dropped packets, but this turns out to not be the case. Indeed, one can eliminate dropped packets by reducing the weight down to 32, but the real increase in throughput doesn t come until you reduce it further to 16. The answer appears to be latency. With the higher weights, the NAPI polling loop runs longer, which prevents the stack from running its own timers. With lower weights, the stack runs more often, and processes packets more often. We also found two situations where NAPI doesn t do very well compared to normal interrupt mode. These are 1) when the NAPI poll time is too fast (less than time it takes to get a packet off the wire) and 2) when the processor is very fast and I/O bus is relatively slow. In both of these cases the driver keeps entering NAPI mode, then dropping back to interrupt mode since it looks like there is no work to do. This is a bad situation to get into as the driver has to take a very high number of interrupts to get the work done. Both of these situations need to be avoided and possibly have a different NAPI tuning parameter to set a minimum poll time. It could even be calculated and used dynamically over time. Where the Community Helped The Linux community has been very helpful over the years with getting fixes back to correct errors or to enhance performance. Most recently, Robert Olsson discovered the NAPI bug discussed earlier. This is just one of countless fixes that have come in over the years to make the driver faster and more stable. Thanks to all to have helped this effort. Another area of performance that was helped by the Linux community was the e1000 small packet performance. There were a lot of comments/discussions in netdev that helped to get the driver to perform better with small packets. Again, some of the key ideas came from Robert
7 2005 Linux Symposium 139 Throughput vs Weight Throughput (Mbps) Dropped packets seen Weight at weights 48 and 64. Figure 4: NAPI Tuning Performance Results Olsson with the work he has done on packet routing. We also added different hardware features over the years to improve small packet performance. Our new RSS feature should help this as well since the hardware will be better able to scale with the number of processor in the system. It is important to note that e1000 benefitted a lot from interaction with the Open Source Community. Support the new I/OAT features which give most if not all the same benefits as TOE without the limitations and drawbacks. There are some kernel changes that need to be implemented to be able to support features like this and we would like for the Linux community to be involved in that work. Where do we go from here Conclusions There are a number of different things that the community could help with. A test tool which can be used to measure performance across OS versions is needed. This will help in comparing performance under different OSes, different network controllers and even different versions of the same driver. The tool needs to be able to use all packet sizes and OS or driver features. Another issue that should be addressed is the NAPI tuning as pointed out above. There are cases where NAPI actually hurts performance but with the correct tuning works much better. More work needs to be done to help the network performance get better on the 2.6 kernel. This won t happen overnight but will be a continuing process. It will get better with work from all of us. Also, work should continue to make NAPI work better in all cases. If it s in your business or personal interest to have better network performance, then it s up to you help make it better. Thanks to all who have helped make everything perform better. Let us keep up the good work.
8 140 Networking Driver Performance and Measurement - e1000 A Case Study References [Leech/Grover] Accelerating Network Receive Processing: Intel R I/O Acceleration Technolgy; Ottawa Linux Symposium 2005
9 Proceedings of the Linux Symposium Volume Two July 20nd 23th, 2005 Ottawa, Ontario Canada
10 Conference Organizers Andrew J. Hutton, Steamballoon, Inc. C. Craig Ross, Linux Symposium Stephanie Donovan, Linux Symposium Review Committee Gerrit Huizenga, IBM Matthew Wilcox, HP Dirk Hohndel, Intel Val Henson, Sun Microsystems Jamal Hadi Salimi, Znyx Matt Domsch, Dell Andrew Hutton, Steamballoon, Inc. Proceedings Formatting Team John W. Lockhart, Red Hat, Inc. Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights to all as a condition of submission.
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Collecting Packet Traces at High Speed
Collecting Packet Traces at High Speed Gorka Aguirre Cascallana Universidad Pública de Navarra Depto. de Automatica y Computacion 31006 Pamplona, Spain [email protected] Eduardo Magaña Lizarrondo
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Leveraging NIC Technology to Improve Network Performance in VMware vsphere
Leveraging NIC Technology to Improve Network Performance in VMware vsphere Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction... 3 Hardware Description... 3 List of Features... 4 NetQueue...
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant [email protected] Applied Expert Systems, Inc. 2011 1 Background
Configuring Your Computer and Network Adapters for Best Performance
Configuring Your Computer and Network Adapters for Best Performance ebus Universal Pro and User Mode Data Receiver ebus SDK Application Note This application note covers the basic configuration of a network
VXLAN Performance Evaluation on VMware vsphere 5.1
VXLAN Performance Evaluation on VMware vsphere 5.1 Performance Study TECHNICAL WHITEPAPER Table of Contents Introduction... 3 VXLAN Performance Considerations... 3 Test Configuration... 4 Results... 5
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters Evaluation report prepared under contract with Emulex Executive Summary As
Demartek June 2012. Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment
June 212 FCoE/iSCSI and IP Networking Adapter Evaluation Evaluation report prepared under contract with Corporation Introduction Enterprises are moving towards 1 Gigabit networking infrastructures and
TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to
Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.
Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study
White Paper Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study 2012 Cisco and/or its affiliates. All rights reserved. This
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
Integrated Network Acceleration Features.
Technical white paper Integrated Network Acceleration Features. of Intel I/O Acceleration Technology and. Microsoft Windows Server 2008 Table of contents Introduction... 2 What Causes I/O. Bottlenecks?...
Monitoring high-speed networks using ntop. Luca Deri <[email protected]>
Monitoring high-speed networks using ntop Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected]
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected] Outline About Vyatta: Open source project, and software product Areas we re working on or interested in
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
PCI Express High Speed Networks. Complete Solution for High Speed Networking
PCI Express High Speed Networks Complete Solution for High Speed Networking Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and
Performance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser
Network Performance Optimisation and Load Balancing Wulf Thannhaeuser 1 Network Performance Optimisation 2 Network Optimisation: Where? Fixed latency 4.0 µs Variable latency
Intel I340 Ethernet Dual Port and Quad Port Server Adapters for System x Product Guide
Intel I340 Ethernet Dual Port and Quad Port Server Adapters for System x Product Guide Based on the new Intel 82580 Gigabit Ethernet Controller, the Intel Ethernet Dual Port and Quad Port Server Adapters
SUN DUAL PORT 10GBase-T ETHERNET NETWORKING CARDS
SUN DUAL PORT 10GBase-T ETHERNET NETWORKING CARDS ADVANCED PCIE 2.0 10GBASE-T ETHERNET NETWORKING FOR SUN BLADE AND RACK SERVERS KEY FEATURES Low profile adapter and ExpressModule form factors for Oracle
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
New!! - Higher performance for Windows and UNIX environments
New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)
Linux NIC and iscsi Performance over 40GbE
Linux NIC and iscsi Performance over 4GbE Chelsio T8-CR vs. Intel Fortville XL71 Executive Summary This paper presents NIC and iscsi performance results comparing Chelsio s T8-CR and Intel s latest XL71
Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade
Application Note Windows 2000/XP TCP Tuning for High Bandwidth Networks mguard smart mguard PCI mguard blade mguard industrial mguard delta Innominate Security Technologies AG Albert-Einstein-Str. 14 12489
High-Density Network Flow Monitoring
Petr Velan [email protected] High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
Flow-based network accounting with Linux
Flow-based network accounting with Linux Harald Welte netfilter core team / hmw-consulting.de / Astaro AG [email protected] Abstract Many networking scenarios require some form of network accounting
How To Test A Microsoft Vxworks Vx Works 2.2.2 (Vxworks) And Vxwork 2.4.2-2.4 (Vkworks) (Powerpc) (Vzworks)
DSS NETWORKS, INC. The Gigabit Experts GigMAC PMC/PMC-X and PCI/PCI-X Cards GigPMCX-Switch Cards GigPCI-Express Switch Cards GigCPCI-3U Card Family Release Notes OEM Developer Kit and Drivers Document
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links Filippo Costa on behalf of the ALICE DAQ group DATE software 2 DATE (ALICE Data Acquisition and Test Environment) ALICE is a
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
General Pipeline System Setup Information
Product Sheet General Pipeline Information Because of Pipeline s unique network attached architecture it is important to understand each component of a Pipeline system in order to create a system that
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin
OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
VMware vsphere 4.1 Networking Performance
VMware vsphere 4.1 Networking Performance April 2011 PERFORMANCE STUDY Table of Contents Introduction... 3 Executive Summary... 3 Performance Enhancements in vsphere 4.1... 3 Asynchronous Transmits...
IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti Abstract 1 Introduction
_ IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti IBM Almaden Research Center San Jose, CA 95120 {psarkar,kaladhar}@almaden.ibm.com tel +1-408-927-1417 fax +1-408-927-3497 Abstract
PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based
PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based Description Silicom s quad port Copper 10 Gigabit Ethernet Bypass server adapter is a PCI-Express X8 network interface
Network Function Virtualization: Virtualized BRAS with Linux* and Intel Architecture
Intel Network Builders Reference Architecture Packet Processing Performance of Virtualized Platforms Network Function Virtualization: Virtualized BRAS with Linux* and Intel Architecture Packet Processing
Creating Overlay Networks Using Intel Ethernet Converged Network Adapters
Creating Overlay Networks Using Intel Ethernet Converged Network Adapters Technical Brief Networking Division (ND) August 2013 Revision 1.0 LEGAL INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
IP SAN Fundamentals: An Introduction to IP SANs and iscsi
IP SAN Fundamentals: An Introduction to IP SANs and iscsi Updated April 2007 Sun Microsystems, Inc. 2007 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 USA All rights reserved. This
Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note
Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Internet SCSI (iscsi)
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain
Comparing the Network Performance of Windows File Sharing Environments
Technical Report Comparing the Network Performance of Windows File Sharing Environments Dan Chilton, Srinivas Addanki, NetApp September 2010 TR-3869 EXECUTIVE SUMMARY This technical report presents the
Enabling Linux* Network Support of Hardware Multiqueue Devices
Enabling Linux* Network Support of Hardware Multiqueue Devices Zhu Yi Intel Corp. [email protected] Peter P. Waskiewicz, Jr. Intel Corp. [email protected] Abstract In the Linux kernel network
PCI Express Overview. And, by the way, they need to do it in less time.
PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into
The Bus (PCI and PCI-Express)
4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the
Open-source routing at 10Gb/s
Open-source routing at Gb/s Olof Hagsand, Robert Olsson and Bengt Gördén, Royal Institute of Technology (KTH), Sweden Email: {olofh, gorden}@kth.se Uppsala University, Uppsala, Sweden Email: [email protected]
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
RoCE vs. iwarp Competitive Analysis
WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
I3: Maximizing Packet Capture Performance. Andrew Brown
I3: Maximizing Packet Capture Performance Andrew Brown Agenda Why do captures drop packets, how can you tell? Software considerations Hardware considerations Potential hardware improvements Test configurations/parameters
Introduction to Intel Ethernet Flow Director and Memcached Performance
White Paper Intel Ethernet Flow Director Introduction to Intel Ethernet Flow Director and Memcached Performance Problem Statement Bob Metcalfe, then at Xerox PARC, invented Ethernet in 1973, over forty
Operating Systems Design 16. Networking: Sockets
Operating Systems Design 16. Networking: Sockets Paul Krzyzanowski [email protected] 1 Sockets IP lets us send data between machines TCP & UDP are transport layer protocols Contain port number to identify
Whitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers
Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers Introduction Adoption of
Gigabit Ethernet Design
Gigabit Ethernet Design Laura Jeanne Knapp Network Consultant 1-919-254-8801 [email protected] www.lauraknapp.com Tom Hadley Network Consultant 1-919-301-3052 [email protected] HSEdes_ 010 ed and
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
Optimizing Network Virtualization in Xen
Optimizing Network Virtualization in Xen Aravind Menon EPFL, Lausanne [email protected] Alan L. Cox Rice University, Houston [email protected] Willy Zwaenepoel EPFL, Lausanne [email protected]
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000 Dot Hill Systems introduction 1 INTRODUCTION Dot Hill Systems offers high performance network storage products
HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring
CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract
Architectural Breakdown of End-to-End Latency in a TCP/IP Network
19th International Symposium on Computer Architecture and High Performance Computing Architectural Breakdown of End-to-End Latency in a TCP/IP Network Steen Larsen, Parthasarathy Sarangam, Ram Huggahalli
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
IxChariot Virtualization Performance Test Plan
WHITE PAPER IxChariot Virtualization Performance Test Plan Test Methodologies The following test plan gives a brief overview of the trend toward virtualization, and how IxChariot can be used to validate
Performance of optimized software implementation of the iscsi protocol
Performance of optimized software implementation of the iscsi protocol Fujita Tomonori and Ogawara Masanori NTT Network Innovation Laboratories 1-1 Hikarinooka Yokosuka-Shi Kanagawa, Japan Email: [email protected],
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
Broadcom Ethernet Network Controller Enhanced Virtualization Functionality
White Paper Broadcom Ethernet Network Controller Enhanced Virtualization Functionality Advancements in VMware virtualization technology coupled with the increasing processing capability of hardware platforms
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Napatech - Sharkfest 2009 1 Presentation Overview About Napatech
An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption
An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi 1 Key Findings Third I/O found
IP SAN BEST PRACTICES
IP SAN BEST PRACTICES PowerVault MD3000i Storage Array www.dell.com/md3000i TABLE OF CONTENTS Table of Contents INTRODUCTION... 3 OVERVIEW ISCSI... 3 IP SAN DESIGN... 4 BEST PRACTICE - IMPLEMENTATION...
Quantifying TCP Performance for IPv6 in Linux- Based Server Operating Systems
Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), November Edition, 2013 Volume 3, Issue 11 Quantifying TCP Performance for IPv6
Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices
Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices A Dell Technical White Paper Dell Symantec THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND
A way towards Lower Latency and Jitter
A way towards Lower Latency and Jitter Jesse Brandeburg [email protected] Intel Ethernet BIO Jesse Brandeburg A senior Linux developer in the Intel LAN Access Division,
Application Server Platform Architecture. IEI Application Server Platform for Communication Appliance
IEI Architecture IEI Benefits Architecture Traditional Architecture Direct in-and-out air flow - direction air flow prevents raiser cards or expansion cards from blocking the air circulation High-bandwidth
CHAPTER 6 NETWORK DESIGN
CHAPTER 6 NETWORK DESIGN Chapter Summary This chapter starts the next section of the book, which focuses on how we design networks. We usually design networks in six network architecture components: Local
Design considerations for efficient network applications with Intel multi-core processor-based systems on Linux*
White Paper Joseph Gasparakis Performance Products Division, Embedded & Communications Group, Intel Corporation Peter P Waskiewicz, Jr. LAN Access Division, Data Center Group, Intel Corporation Design
Pump Up Your Network Server Performance with HP- UX
Pump Up Your Network Server Performance with HP- UX Paul Comstock Network Performance Architect Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject
PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based
PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based Description Silicom 10Gigabit TOE PCI Express server adapters are network interface cards that contain multiple
Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters
Enterprise Strategy Group Getting to the bigger truth. ESG Lab Review Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters Date: June 2016 Author: Jack Poller, Senior
pktgen the linux packet generator
pktgen the linux packet generator Robert Olsson Uppsala Universitet & SLU [email protected] Abstract pktgen is a high-performance testing tool included in the Linux kernel. pktgen is currently the
Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:
Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT: In view of the fast-growing Internet traffic, this paper propose a distributed traffic management
LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment
LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment Evaluation report prepared under contract with LSI Corporation Introduction Interest in solid-state storage (SSS) is high, and IT
Gigabit Ethernet Packet Capture. User s Guide
Gigabit Ethernet Packet Capture User s Guide Copyrights Copyright 2008 CACE Technologies, Inc. All rights reserved. This document may not, in whole or part, be: copied; photocopied; reproduced; translated;
NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS
NFS SERVER WITH 1 GIGABIT ETHERNET NETWORKS A Dell Technical White Paper DEPLOYING 1GigE NETWORK FOR HIGH PERFORMANCE CLUSTERS By Li Ou Massive Scale-Out Systems delltechcenter.com Deploying NFS Server
Microsoft Windows Server 2003 vs. Linux Competitive File Server Performance Comparison
April 2003 1001 Aviation Parkway, Suite 400 Morrisville, NC 27560 919-380-2800 Fax 919-380-2899 320 B Lakeside Drive Foster City, CA 94404 6-513-8000 Fax 6-513-8099 www.veritest.com [email protected] Microsoft
VXLAN: Scaling Data Center Capacity. White Paper
VXLAN: Scaling Data Center Capacity White Paper Virtual Extensible LAN (VXLAN) Overview This document provides an overview of how VXLAN works. It also provides criteria to help determine when and where
Fibre Channel over Ethernet in the Data Center: An Introduction
Fibre Channel over Ethernet in the Data Center: An Introduction Introduction Fibre Channel over Ethernet (FCoE) is a newly proposed standard that is being developed by INCITS T11. The FCoE protocol specification
Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
