Evaluating the Suitability of Server Network Cards for Software Routers
|
|
|
- Erika Paul
- 9 years ago
- Views:
Transcription
1 Evaluating the Suitability of Server Network Cards for Software Routers Maziar Manesh Katerina Argyraki Mihai Dobrescu Norbert Egi Kevin Fall Gianluca Iannaccone Eddie Kohler Sylvia Ratnasamy EPFL, UCLA, Intel Labs Berkeley, Lancaster Univ. 1. INTRODUCTION The advent of multicore CPUs has led to renewed interest in software routers built from commodity PC hardware[6, 5, 8, 7, 3, 4]. The typical approach to scaling network processing in these systems is to distribute packets, or rather flows of packets, across multiple cores that process them in parallel. However, the traffic arriving (departing) on an incoming (outgoing) link at a router is inherently serial and hence we need a mechanism that appropriately demultiplexes (multiplexes) the traffic between a serial link and a set of cores. I.e., multiple cores can parallelize the processing of a traffic stream but to fully exploit the parallelism due to multiple cores we must first be able to parallelize the delivery of packets to and from cores. Moreover, this parallelization must be achieved in a manner that is: (i) efficient, ensuring that the splitting/merging of traffic isn t the bottleneck along a packet s processing path and (ii) well balanced such that input processing load can be well balanced across available cores for a range of input traffic workloads (e.g., diverse flow counts, flow sizes, packet processing applications, and so forth). If either of these requirements is not met, the parallelism due to multiple cores might well be moot. In other words, achieving parallelism in packet delivery is critical for any software routing system that exploits multiple cores. Recent efforts point to modern server network interface cards (NICs) as offering the required mux/demux capability through new hardware classification features[5, 8, 3, 7]. In a nutshell, 1 these NICs offer the option of classifying packets that arrive at a port into one of mul- 1 We discuss current NIC architectures and features in greater detail in Section 3. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM PRESTO 2010, November 30, 2010, Philadelphia, USA. Copyright 2010 ACM /10/11...$ tiple hardware queues on the NIC and this set of queues can then be partitioned across multiple cores; each core thus processes the subset of traffic that arrives at its assigned queues. Effectively, this NIC-level classification capability serves to parallelize the path from a NIC to the cores, and vice versa. Building on this observation, recent efforts leverage multi-q NICs in their prototype systems[5, 8, 3, 6] and show that enabling NIC-level classification significantly improves a server s packet processing capability ([5] reports a 3.3x increase in forwarding throughput through NIC-level classification). However there has been little evaluation to stress test the extent to which modern NICs and their classification capabilities match the requirements of software routing from the standpoint of both, performance and functionality. For example, as we discuss in the following section, there has been little evaluation of the performance impact of scaling NIC-based classification to large numbers of queues (i.e., a high mux/demux fanout) or whether current classification options can balance load under varied input traffic workloads. It is perhaps worth emphasizing that high-speed packet processing is not 2 a common server application and hence testing server NICs under routing workloads is not, to our knowledge, on the routine checklist for NIC designers; as such, it isn t obvious that server NICs would fare well when scrutinized through the lens of a router designer. This paper takes a first step towards such an evaluation. We focus on the latest generation of widelyused server NICs and experimentally compare its performance characteristics to that of an ideal parallel NIC and a serial only NIC. We show that although commodity NICs do improve on serial-only NICs (with 3x higher throughput on typical workloads) they lag an ideal parallel NIC (achieving 30% lower throughput than an ideal parallel NIC). We find similar gaps in the classification features these NICs offer. We thus conclude with recommendations for NIC modifications that we believe would improve their suitability for software 2 yet!
2 routers. The remainder of this paper is organized as follows: in Section 2, we define the problem of parallelizing packet delivery in software routers and the goals for an ideal parallel NIC. Section 3 explains the methodology for our experimental evaluation which we present in Section 4. We conclude in Section PROBLEM DEFINITION The high level question we aim to address in this paper is quite simple: what is the most efficient way to split R bps of incoming traffic across n cores? Despite its simplicity, addressing this question represents a key challenge for the feasibility of high speed software routers. In fact, spreading the traffic load across multiple cores is typically unavoidable since a single core may not be able to handle the line rate of one port. To date, the best reported per core forwarding rate falls short of 10Gbps in the case of basic IPv4 routing (approx. 5 Gbps/core with 64B packets according to [8]). Given that the current trends for processor architectures is to increase the total number of cores per processor rather than the single core performance, it is clear that traffic must be split across multiple cores to handle 10Gbps or above rates or any kind of more advanced packet processing. Once we split traffic across cores, we still need to specify what properties make a design efficient. To a first approximation, efficiency can be characterized along two dimensions: performance and control. The ideal performance is such that if one core is capable of processing X bps, then n cores should handle nx bps. In reality, there are many different reasons why this ideal scaling may not be achieved in general (e.g., contention for memory, cache or shared data structures). However, in this paper we are interested in understanding the overhead, if any, due just to mux/demux traffic from the network interface to the cores. As we will see in Section 4 the challenge is in the careful design of packet processing experiments that allow us to isolate the impact of the NIC on overall system performance. By control we mean the ability to define which core receives what subset of packets. This is important in the context of IP forwarding to avoid introducing packet reordering, for example. Other forms of fine grained control on the manner in which traffic is split may include assigning priorities to traffic streams or making sure that a given core processes traffic of a specific customer. In this paper we investigate two approaches available today to split traffic: the first relies on hardware support in the form of independent queues and a packet classification engine on the network interface; the second is instead implemented in software with a single core dedicated to the task of splitting traffic across the other cores. 3. OVERVIEW OF CURRENT NETWORK INTERFACES To evaluate and understand the performance of current off-the-shelf server NICs, we consider a simplified view of the server components. Figure 1 shows a high level view of the components that participate in packet processing. Current high-end Intel Ethernet NICs3 use a single controller that can handle two 10Gbps Ethernet ports. The NIC communicates to the rest of the system via the I/O Hub that terminates the PCIe lanes (8 lanes are required for this type of network card). Finally the I/O Hub transfers the packets to memory using Intel Quick Path Interconnect links (or similar technologies for other processor manufacturers e.g. Hypertransport in AMD systems). Figure 1: Server components To avoid contention when multiple cores access the same 10Gbps port, server NICs can give the illusion of a dedicated 10Gbps port per core using multiple hardware queues. This way, each core accesses the hardware queue independently (i.e. no need for any synchronization with other cores) while the NIC controller is in charge of classifying packets and placing them in one of the many queues. Today s NIC provide a small set of classification algorithms on the receive side [2]: hash-based (such as Receive Side Scaling, RSS) where a hash function is applied to the header fields of the incoming packets and its output is used to select one of the hardware queues. This is used to make sure each queue receives an equal portion of the incoming traffic. address-based (such as VMDq) where each queue is associated with a different ethernet address and 3 All the results and discussions on this paper refer to the Intel 10Gbps 82599EB controller [2] (codename Niantic ).
3 the NIC accepts packets destined for any of the ethernet addresses. This classification is commonly used in conjunction with virtualization to give each guest VM the abstraction of a dedicated interface. flow-based (e.g., Flow Director) where each hardware queue is associated to a flow that can be defined using any sequence of bytes in the packet header. This allows for a very flexible flow definition and can support several thousands of concurrent flows. For the purpose of this paper we will look only at the first classification method (hash-based) where the hash is computed on the classical 5-tuple made of protocol, source and destination IP addresses and port numbers if present (this is the same approach followed in [5] and [8]). The reason behind this decision is that the addressbased and flow-based classification algorithms are not well suited to routing workloads as they both limit the total number of flows that are allowed in the system. On the transmit side, the NIC is in charge of merging the traffic from several hardware queues to the FIFO buffer that feeds into the ethernet transmitter. The hardware queues are served in a round-robin fashion with the possibility of rate limiting each queue independently or assigning traffic priorities following the IEEE 802.1p standard [1]. 4. EVALUATION In order to understand how off-the-shelf network cards can support the requirements of high speed software routers we focus on two questions: 1. How well does the packet forwarding performance scale with the number of cores? 2. How well does the packet forwarding performance scale with the number of queues? Addressing the first question is really akin to a classical black-box approach to system performance evaluation. As such, it measures the NIC and system performance as well as the overhead introduced by the software. The second question instead attempts at isolating the impact of multiqueue support in the NIC from that of other system components. Our intent there is to directly measure the performance degradation due to multiplexing traffic across cores. 4.1 Experimental setup For our study, we chose an Intel Xeon 5560 (Nehalem) server with two sockets, each with four 2.8GHz cores and an 8MB L3 cache. The system uses one 10Gbps NIC with two ports installed in a PCIe2.0 x8 slot. Our server runs Linux with Click[9] in polling mode i.e., the CPUs poll for incoming packets rather than being interrupted. The input traffic is made of 64 byte back-to-back packets the worst-case traffic scenario and it is generated by a separate 8-core server. We consider only one type of simple packet processing that we call minimal forwarding. With minimal forwarding traffic arriving at one of the NIC ports is always forwarded to the other physical port of the same NIC. We chose this simple test because we are interested in measuring the raw performance of the network card without introducing any side effects due to the packet processing software. With this setup, our primary performance metric is the maximum forwarding rate we achieve, reported in terms of packets-per-second (pps). 4.2 Forwarding Performance To address the first question, we performed a set of experiments using a single 10GbE port. These experiments involve increasing both the numbers of CPU cores dedicated to packet processing as well as the number of hardware queues allocated on the NIC. In all experiments the number of cores is equal to the number of queues and each core has a dedicated receive and transmit queue pair. This guarantees that there is no contention across cores or shared memory data structures. This setup is the one commonly recommended when using modern server NICs. We then compare the performance achieved using multiple hardware queues against two natural alternatives: i) a purely software solution ( serial NIC ) that assumes that the NIC has no multiple queue support; ii) an ideal NIC scenario where each core has completely dedicated access to the 10GE port (i.e., no traffic splitting/merging necessary). This is an idealized view of the system as it assumes there is no overhead in running an application in a multicore system compared to a single core server. The serial NIC does away with the use of the NIC hardware queues and performs traffic splitting and merging in software. In all such experiments, one core is in charge of the receive path, one core is in charge of the transmit path and the remaining cores perform the actual packet processing. In the case of one core experiments, the same core handles both the receive and transmit side. The intent here is to understand the role hardware queues in modern NICs play in high speed software routers. We call this set of experiments serial NIC given that the NIC behaves as a simple FIFO packet buffer. The ideal NIC scenarios where no hardware or software overhead in splitting traffic is assumed is also drawn for reference purposes only. The performance of the ideal NIC is a computed, not measured, value we consider the maximum forwarding rate the system can reach with just one core and then multiply that rate by the number of cores until we reach the line rate for a
4 10Gbps Ethernet port, i.e., approximately 14.9 Mpps. 4. Figure 2 shows the aggregate forwarding rate in million of packets per second with an increasing number of cores. Figure 3 plots the same data normalized by the number of cores. Throughput (Mpps) Current NIC - 1 port Serial NIC - 1 port Ideal NIC - 1 port Number of Cores Figure 2: Aggregate forwarding rate with increasing numbers of cores and queues Throughput (Mpps) Current NIC - 1 port Serial NIC - 1 port Ideal NIC - 1 port an ideal NIC would predict. Yet, these experiments do not allow us to identify the source of the overhead. It could be in the NIC itself or in other system resources (e.g., memory, PCI bridge, etc.) that are shared across the cores. In order to isolate the impact of the NIC alone, we should measure the packet forwarding performance when all system components are kept fixed (i.e. number of cores, PCIe lanes, controllers, etc.) and the only change is in the number of hardware queues. Specifically, this means that we would ideally use only one controller and one PCIe slot (8 lanes) while varying the mapping between hardware queues and physical 10Gbps Ethernet ports (e.g., 8 queues on 1 port, 4 queues on 2 ports, 2 queues on 4 ports, etc.) Unfortunately current NIC hardware does not provide enough flexibility to explore the entire parameter space. The Niantic controller can handle only two physical 10Gbps ports. That allows us to experiment only with 8 queues or 4 queues per port (so that the number of cores remains constant). To overcome this limitation and still be able to understand the impact of a growing number of hardware queues, we compare the case of using one physical port and Q queues (where Q ranges from 2 to 8) to the case of using two physical ports and Q/2 hardware queues per port. In both cases the number of cores is constant and equal to Q, and only one ethernet controller is used. We will then measure the performance degradation (if any) in using a larger number of queues per physical port if hardware queues were to present no overhead, there should be no performance degradation between using two ports and Q/2 queues per port versus one port and Q queues Number of Cores 20 2 ports 1 port Figure 3: Per-core forwarding rate with increasing numbers of cores and queues From the figures we can derive two conclusions. On the one hand, multiple hardware queue support in the NICs is a clear improvement over software-only solutions. The performance of current NICs is more than 3x higher than software-only serial NICs. On the other hand, the overhead of current NICs in handling multiple queues is significant. The per-core performance of current NICs is 30% lower than what 4 Note that the maximum packet rate on a 10Gbps Ethernet port is not 19.5 Mpps (i.e., /(8 64)) because the standard requires a 20 byte long spacing between back-to-back packets. Throughput (Mpps) Total Number of Queues Figure 4: Comparing the aggregate forwarding rate varying the number of hardware queues per port. The number of cores is equal to the total number of queues so that one core always has one and only one dedicated hardware queue.
5 Forwarding rate degradation from two ports to one port (%) Total Number of Queues Figure 5: Ratio of the forwarding rate varying the number of hardware queues per port. The number of cores is equal to the total number of queues so that one core always has one and only one dedicated hardware queue. the physical port. At this time we do not have access to the detailed design of the NIC controller to PHY connection but we conjecture that handling of the FIFO buffer that accomodates the packets right off the wire could be the reason for the loss in performance. It is important to note that this could also be a deliberate design decision given that the NIC under study is designed for server platforms. Its design is evaluated by how efficiently it can terminate TCP or UDP network streams. In our experiments we have always used a worst case workload (64 byte long, back to back packets, little or no payload). That is a traditional benchmark for routing but of very little interest for traditional server applications. Reaching 11.7 Mpps means that at an average packet size of 86 bytes this system could fully saturate a 10Gbps Ethernet port. However, following the same rationale, it is foreseeable that packet forwarding may become a workload of interest for NIC designers as software routers become a reality in the telecommunication industry. Figure 4 and 5 show the forwarding performance with varying the number of hardware queues per port (Q = 2,4,6,8). The graphs clearly indicate that a smaller number of queues lead to better performance. The forwarding performance scales almost perfectly across multiple cores when using two ports: 6.3 Mpps with 2 cores, 23.2 Mpps with 8 cores. Using a single 10G port (but the same controller) the forwarding rate peaks at 11.7 Mpps with 4 queues (and 4 cores) and then degrades a little (by less than 1 Mpps) as the number of queues grows. We can derive a few conclusions from these two figures: the NIC controller, the cores and the PCIe 8-lane slot can handle 23.2 Mpps: well above the line rate of a 10 Gbps interface. The performance bottleneck that we observed in Figure 2 is therefore not within any of those components. each 10Gbps physical port cannot run at the maximum packet rate. The maximum rate we have been able to measure in any experiment is indeed 11.7 Mpps below 14.9 Mpps, the maximum packet rate of a 10 Gbps Ethernet port. Two physical ports can run at about 23.4 Mpps (i.e. 2x 11.7 Mpps). there is an overhead in adding more hardware queues as highlighted by Figure 5 for 2 and 4 hardware queues. However, this overhead is relatively small leading to approx 10% loss in forwarding rate. In summary, we have been able to measure the overhead of multiple queues and isolate where the bottleneck is in the system: between the NIC controller and 4.3 Functionality We focus now on whether commodity NICs are well suited for software routing at the functional level. We start by summarizing the functionality NICs support today and then introduce a wishlist of new features we believe would further improve the suitability of these NICs for packet processing. NIC functionality today. Beyond multiple hardware queues, current NICs implement a plethora of mechanisms that are specifically designed to offload the cores of some of the most basic packet processing operations. As we said before, however, most of these mechanisms are intended for server applications and as such focus on TCP stream processing (e.g., TCP Checksum offload, TCP Receive Side Coalescing, TCP timers, etc.) or for virtualization (e.g., VT-d, virtual machine device queues, etc.). A few features that are directly applicable to software routers include: (i) handling a large number of hardware queues (up to 128) to spread traffic across many cores; (ii) flow-level filters (hash-based or up to 8K flows) that can be used to limit packet reordering; (iii) VLAN support; (iv) rate limiting and priorities (compliant with IEEE 802.1p); (v) IPsec encryption and packet encapsulation. What follows is a simple list of additional features that are not currently present in server NICs but that we believe could represent a useful initial set of feature for a next generation NIC were packet forwarding and routing is a first class design concern. Runtime configuration. One of the major drawbacks of most of today s NIC
6 features is that they cannot be modified quickly while the NIC is processing incoming traffic. Many features even require a NIC reset in order to be reconfigured. There are instead many network scenarios where it would be desirable to change the configuration at runtime, for example: If the current hash function leads to a poor balance of flows across cores, one would like to change the function (maybe even just between a pre-configured set of hash functions). This could be even done automatically by the NIC controller as it realizes that one hardware queue is full or nearly full. If the traffic load is low enough that a smaller number of cores could handle all of the packet processing needs, it would be useful to reduce the number of hardware queues. This could lead to a more energy efficient design as one could immediately turn off cores that where assigned to queues that are removed. More general classification. Current NICs support classification on fixed fields (such as port numbers and source/destination IP addresses) or with a list of exact match rules on the ethernet address or on a packet offset (but then limited to a small number of flows). For new protocols or applications, it might be useful to give the programmer a more fine grained (or even programmable to a certain extent) control on the classification. For example, one could envision packet classification techniques that are designed so that one core only receives packets destined to a subset of the prefixes in the routing table. This way the route lookup operation could be optimized by splitting the routing tables across cores to improve cache locality this is of particular importance given the current trend towards non-uniform memory architectures (NUMA). Above we have very briefly listed some of the features that would benefit software routers if present in off-the-shelf NICs. Future work includes exploring efficient hardware implementations for these features and the appropriate abstractions or programming model by which NICs expose the functionality such features offer. The intent is to find the minimal set of features that could benefit software routers without requiring radical changes to current NICs design practices. contribution of the different system components to the overall packet processing performance. Our investigation has shown the advantage that multi- Q NICs present compared to traditional NICs. We have also shown limitations in the design of current NICs both in the form of the maximum rate per port that can be achieved and in performance degradation (though limited) when increasing the number of hardware queues that receive traffic. In future work, we plan to investigate the feasibility and value of more advanced NIC features. Our goal would be to identify new features that benefit software router workloads without incurring significantly increased manufacturing costs, thereby preserving the high volume, off-the-shelf nature of current server NICs. 6. REFERENCES [1] Intel GbE Controller Datasheet. datashts/82599 datasheet.pdf. [2] Intel 82599EB 10 Gigabit Ethernet Controller. [3] R. Bolla and R. Bruschi. PC-based Software Routers: High Performance and Application Service Support. In Proceedings of the ACM SIGCOMM Wokshop on Programmable Routers for Extensible Services of TOmorrow (PRESTO), Seattle, WA, USA, August [4] R. Bolla, R. Bruschi, G. Lamanna, and A. Ranieri. Drop: An open-source project towards distributed sw router architectures. In GLOBECOM, pages 1 6, [5] M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. RouteBricks: Exploiting Parallelism to Scale Software Routers. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), [6] N. Egi, A. Greenhalgh, mark Handley, M. Hoerdt, F. Huici, and L. Mathy. Fairness Issues in Software Virtual Routers. In Proceedings of the ACM SIGCOMM Workshop on Programmable Routers for Extensible Services of TOmorrow (PRESTO), Seattle, WA, USA, August [7] N. Egi, A. Greenhalgh, mark Handley, M. Hoerdt, F. Huici, and L. Mathy. Towards High Performance Virtual Routers on Commodity Hardware. In Proceedings of the ACM International Conference on Emerging Networking EXperiments and Technologies (CoNEXT), Spain, December [8] S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated Software Router. In Proceedings of the ACM SIGCOMM Conference, [9] E. Kohler, R. Morris, B. Chen, J. Jannoti, and M. F. Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems, 18(3): , August CONCLUSIONS We presented a study of the performance of modern server NICs in presence of packet forwarding workloads. We highlighted the challenges involved in isolating the
Improved Forwarding Architecture and Resource Management for Multi-Core Software Routers
Improved Forwarding Architecture and Resource Management for Multi-Core Software Routers Norbert Egi, Adam Greenhalgh, Mark Handley, Gianluca Iannaccone, Maziar Manesh, Laurent Mathy, Sylvia Ratnasamy
OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin
OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment
Performance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Challenges in high speed packet processing
Challenges in high speed packet processing Denis Salopek University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia [email protected] Abstract With billions of packets traveling
Open-source routing at 10Gb/s
Open-source routing at Gb/s Olof Hagsand, Robert Olsson and Bengt Gördén, Royal Institute of Technology (KTH), Sweden Email: {olofh, gorden}@kth.se Uppsala University, Uppsala, Sweden Email: [email protected]
Evaluation of Switching Performance of a Virtual Software Router
Evaluation of Switching Performance of a Virtual Roberto Rojas-Cessa, Khondaker M. Salehin, and Komlan Egoh Abstract s are an alternative low-cost and moderate-performance router solutions implemented
On Multi Gigabit Packet Capturing With Multi Core Commodity Hardware
On Multi Gigabit Packet Capturing With Multi Core Commodity Hardware Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, and Gregorio Procissi CNIT and Università di Pisa, Pisa, Italy Abstract. Nowadays
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
RouteBricks: A Fast, Software- Based, Distributed IP Router
outebricks: A Fast, Software- Based, Distributed IP outer Brad Karp UCL Computer Science (with thanks to Katerina Argyraki of EPFL for slides) CS GZ03 / M030 18 th November, 2009 One-Day oom Change! On
RouteBricks: Exploiting Parallelism To Scale Software Routers
RouteBricks: Exploiting Parallelism To Scale Software Routers Mihai Dobrescu 1 and Norbert Egi 2, Katerina Argyraki 1, Byung-Gon Chun 3, Kevin Fall 3, Gianluca Iannaccone 3, Allan Knies 3, Maziar Manesh
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Leveraging NIC Technology to Improve Network Performance in VMware vsphere
Leveraging NIC Technology to Improve Network Performance in VMware vsphere Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction... 3 Hardware Description... 3 List of Features... 4 NetQueue...
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
VMware vsphere 4.1 Networking Performance
VMware vsphere 4.1 Networking Performance April 2011 PERFORMANCE STUDY Table of Contents Introduction... 3 Executive Summary... 3 Performance Enhancements in vsphere 4.1... 3 Asynchronous Transmits...
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to
Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Napatech - Sharkfest 2009 1 Presentation Overview About Napatech
Infrastructure for active and passive measurements at 10Gbps and beyond
Infrastructure for active and passive measurements at 10Gbps and beyond Best Practice Document Produced by UNINETT led working group on network monitoring (UFS 142) Author: Arne Øslebø August 2014 1 TERENA
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
Network Function Virtualization: Virtualized BRAS with Linux* and Intel Architecture
Intel Network Builders Reference Architecture Packet Processing Performance of Virtualized Platforms Network Function Virtualization: Virtualized BRAS with Linux* and Intel Architecture Packet Processing
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant [email protected] Applied Expert Systems, Inc. 2011 1 Background
RouteBricks: Exploiting Parallelism To Scale Software Routers
RouteBricks: Exploiting Parallelism To Scale Software Routers Mihai Dobrescu 1 and Norbert Egi 2, Katerina Argyraki 1, Byung-Gon Chun 3, Kevin Fall 3, Gianluca Iannaccone 3, Allan Knies 3, Maziar Manesh
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
White Paper Abstract Disclaimer
White Paper Synopsis of the Data Streaming Logical Specification (Phase I) Based on: RapidIO Specification Part X: Data Streaming Logical Specification Rev. 1.2, 08/2004 Abstract The Data Streaming specification
Introduction to Intel Ethernet Flow Director and Memcached Performance
White Paper Intel Ethernet Flow Director Introduction to Intel Ethernet Flow Director and Memcached Performance Problem Statement Bob Metcalfe, then at Xerox PARC, invented Ethernet in 1973, over forty
An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based
PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based Description Silicom s quad port Copper 10 Gigabit Ethernet Bypass server adapter is a PCI-Express X8 network interface
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
NetFlow Performance Analysis
NetFlow Performance Analysis Last Updated: May, 2007 The Cisco IOS NetFlow feature set allows for the tracking of individual IP flows as they are received at a Cisco router or switching device. Network
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters Evaluation report prepared under contract with Emulex Executive Summary As
Cisco Integrated Services Routers Performance Overview
Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
Packet-based Network Traffic Monitoring and Analysis with GPUs
Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected]
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected] Outline About Vyatta: Open source project, and software product Areas we re working on or interested in
Gigabit Ethernet Packet Capture. User s Guide
Gigabit Ethernet Packet Capture User s Guide Copyrights Copyright 2008 CACE Technologies, Inc. All rights reserved. This document may not, in whole or part, be: copied; photocopied; reproduced; translated;
Simplify VMware vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters
WHITE PAPER Intel Ethernet 10 Gigabit Server Adapters vsphere* 4 Simplify vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters Today s Intel Ethernet 10 Gigabit Server Adapters can greatly
DPDK Summit 2014 DPDK in a Virtual World
DPDK Summit 2014 DPDK in a Virtual World Bhavesh Davda (Sr. Staff Engineer, CTO Office, ware) Rashmin Patel (DPDK Virtualization Engineer, Intel) Agenda Data Plane Virtualization Trends DPDK Virtualization
Bridging the Gap between Software and Hardware Techniques for I/O Virtualization
Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Jose Renato Santos Yoshio Turner G.(John) Janakiraman Ian Pratt Hewlett Packard Laboratories, Palo Alto, CA University of
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
IxChariot Virtualization Performance Test Plan
WHITE PAPER IxChariot Virtualization Performance Test Plan Test Methodologies The following test plan gives a brief overview of the trend toward virtualization, and how IxChariot can be used to validate
VXLAN Performance Evaluation on VMware vsphere 5.1
VXLAN Performance Evaluation on VMware vsphere 5.1 Performance Study TECHNICAL WHITEPAPER Table of Contents Introduction... 3 VXLAN Performance Considerations... 3 Test Configuration... 4 Results... 5
Broadcom Ethernet Network Controller Enhanced Virtualization Functionality
White Paper Broadcom Ethernet Network Controller Enhanced Virtualization Functionality Advancements in VMware virtualization technology coupled with the increasing processing capability of hardware platforms
High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features
UDC 621.395.31:681.3 High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features VTsuneo Katsuyama VAkira Hakata VMasafumi Katoh VAkira Takeyama (Manuscript received February 27, 2001)
The Bus (PCI and PCI-Express)
4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the
Scaling Networking Applications to Multiple Cores
Scaling Networking Applications to Multiple Cores Greg Seibert Sr. Technical Marketing Engineer Cavium Networks Challenges with multi-core application performance Amdahl s Law Evaluates application performance
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
Networking Driver Performance and Measurement - e1000 A Case Study
Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg
Optimizing TCP Forwarding
Optimizing TCP Forwarding Vsevolod V. Panteleenko and Vincent W. Freeh TR-2-3 Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556 {vvp, vin}@cse.nd.edu Abstract
Application Note Gigabit Ethernet Port Modes
Application Note Gigabit Ethernet Port Modes Application Note Gigabit Ethernet Port Modes Table of Contents Description... 3 Benefits... 4 Theory of Operation... 4 Interaction with Other Features... 7
Performance Evaluation of Linux Bridge
Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet
1-Gigabit TCP Offload Engine
White Paper 1-Gigabit TCP Offload Engine Achieving greater data center efficiencies by providing Green conscious and cost-effective reductions in power consumption. June 2009 Background Broadcom is a recognized
10 Gbit Hardware Packet Filtering Using Commodity Network Adapters. Luca Deri <[email protected]> Joseph Gasparakis <joseph.gasparakis@intel.
10 Gbit Hardware Packet Filtering Using Commodity Network Adapters Luca Deri Joseph Gasparakis 10 Gbit Monitoring Challenges [1/2] High number of packets to
Benchmarking Virtual Switches in OPNFV draft-vsperf-bmwg-vswitch-opnfv-00. Maryam Tahhan Al Morton
Benchmarking Virtual Switches in OPNFV draft-vsperf-bmwg-vswitch-opnfv-00 Maryam Tahhan Al Morton Introduction Maryam Tahhan Network Software Engineer Intel Corporation (Shannon Ireland). VSPERF project
High-Density Network Flow Monitoring
Petr Velan [email protected] High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
OpenFlow Based Load Balancing
OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple
OpenFlow Switching: Data Plane Performance
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 21 proceedings OpenFlow : Data Plane Performance Andrea Bianco,
Linux NIC and iscsi Performance over 40GbE
Linux NIC and iscsi Performance over 4GbE Chelsio T8-CR vs. Intel Fortville XL71 Executive Summary This paper presents NIC and iscsi performance results comparing Chelsio s T8-CR and Intel s latest XL71
Network Function Virtualization
Intel Network Builders Reference Architecture Network Function Virtualization Network Function Virtualization Quality of Service in Broadband Remote Access Servers with Linux* and Intel Architecture Audience
MIDeA: A Multi-Parallel Intrusion Detection Architecture
MIDeA: A Multi-Parallel Intrusion Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011 Network
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro
Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro Whitepaper What s wrong with today s clouds? Compute and storage virtualization has enabled
MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC
MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC Overview Summary The new enhanced-capability port adapters are targeted to replace the following Cisco port adapters: 1-port T3 Serial Port Adapter
PCI-SIG SR-IOV Primer. An Introduction to SR-IOV Technology Intel LAN Access Division
PCI-SIG SR-IOV Primer An Introduction to SR-IOV Technology Intel LAN Access Division 321211-002 Revision 2.5 Legal NFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
ConnectX -3 Pro: Solving the NVGRE Performance Challenge
WHITE PAPER October 2013 ConnectX -3 Pro: Solving the NVGRE Performance Challenge Objective...1 Background: The Need for Virtualized Overlay Networks...1 NVGRE Technology...2 NVGRE s Hidden Challenge...3
Design Issues in a Bare PC Web Server
Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York
How To Get 10Gbe (10Gbem) In Your Data Center
Product Highlight o 10 Gigabit Ethernet (10GbE) Performance for the Entire Datacenter o Standard CAT-6a Cabling with RJ45 Connectors o Backward Compatibility with Existing 1000BASE- T Networks Simplifies
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Achieving Low-Latency Security
Achieving Low-Latency Security In Today's Competitive, Regulatory and High-Speed Transaction Environment Darren Turnbull, VP Strategic Solutions - Fortinet Agenda 1 2 3 Firewall Architecture Typical Requirements
STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT
STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT 1. TIMING ACCURACY The accurate multi-point measurements require accurate synchronization of clocks of the measurement devices. If for example time stamps
基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器
基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器 楊 竹 星 教 授 國 立 成 功 大 學 電 機 工 程 學 系 Outline Introduction OpenFlow NetFPGA OpenFlow Switch on NetFPGA Development Cases Conclusion 2 Introduction With the proposal
Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data
White Paper Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data What You Will Learn Financial market technology is advancing at a rapid pace. The integration of
Fibre Channel over Ethernet in the Data Center: An Introduction
Fibre Channel over Ethernet in the Data Center: An Introduction Introduction Fibre Channel over Ethernet (FCoE) is a newly proposed standard that is being developed by INCITS T11. The FCoE protocol specification
Big Data Technologies for Ultra-High-Speed Data Transfer and Processing
White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel
Optimizing Network Virtualization in Xen
Optimizing Network Virtualization in Xen Aravind Menon EPFL, Lausanne [email protected] Alan L. Cox Rice University, Houston [email protected] Willy Zwaenepoel EPFL, Lausanne [email protected]
Gigabit Ethernet Design
Gigabit Ethernet Design Laura Jeanne Knapp Network Consultant 1-919-254-8801 [email protected] www.lauraknapp.com Tom Hadley Network Consultant 1-919-301-3052 [email protected] HSEdes_ 010 ed and
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation ABSTRACT Ángel Cuevas Rumín Universidad Carlos III de Madrid Department of Telematic Engineering Ph.D Student
Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters
Enterprise Strategy Group Getting to the bigger truth. ESG Lab Review Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters Date: June 2016 Author: Jack Poller, Senior
Data Center Network Topologies: FatTree
Data Center Network Topologies: FatTree Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking September 22, 2014 Slides used and adapted judiciously
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring
CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract
VXLAN: Scaling Data Center Capacity. White Paper
VXLAN: Scaling Data Center Capacity White Paper Virtual Extensible LAN (VXLAN) Overview This document provides an overview of how VXLAN works. It also provides criteria to help determine when and where
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
SiteCelerate white paper
SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance
