Linux Software Router: Data Plane Optimization and Performance Evaluation

Size: px
Start display at page:

Download "Linux Software Router: Data Plane Optimization and Performance Evaluation"

Transcription

1 6 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 Linux Software Router: Data Plane Optimization and Performance Evaluation Raffaele Bolla and Roberto Bruschi DIST - Department of Communications, Computer and Systems Science University of Genoa Via Opera Pia 13, Genoa, Italy {raffaele.bolla, roberto.bruschi}@unige.it Abstract - Recent technological advances provide an excellent opportunity to achieve truly effective results in the field of open Internet devices, also known as Open Routers or ORs. Even though some initiatives have been undertaken over the last few years to investigate ORs and related topics, other extensive areas still require additional investigation. In this contribution we report the results of the in-depth optimization and testing carried out on a PC Open Router architecture based on Linux software and COTS hardware. The main focus of this paper was the forwarding performance evaluation of different OR Linux-based software architectures. This analysis was performed with both external (throughput and latencies) and internal (profiling) measurements. In particular, for the external measurements, a set of RFC2544 compliant tests was also proposed and analyzed. Index Terms - Linux Router; Open Router; RFC 2544; IP forwarding. I. INTRODUCTION Internet technology has been developed in an open environment and all Internet-related protocols, architectures and structures are publicly created and described. For this reason, in principle, everyone can easily develop an Internet device (e.g., a router). On the contrary, and to a certain extent quite surprising, most of the professional devices are developed in an extremely closed manner. In fact, it is very difficult to acquire details about internal operations and to perform anything more complex than a parametrical configuration. From a general viewpoint, this is not very strange since it can be considered a clear attempt to protect the industrial investment. However, sometimes the experimental nature of the Internet and its diffusion in many contexts might suggest a different approach. Such a need is even more evident within the scientific community, which often runs into various problems when carrying out experiments, testbeds and trials to evaluate new functionalities and protocols. Today, recent technological advances provide an opportunity to do something truly effective in the field of open Internet devices, sometimes called Open Routers (ORs). Such an opportunity arises from the use of Open Source Operative Systems (OSs) and COTS/PC components. The attractiveness of the OR solution can be summarized as: multi-vendor availability, low-cost and continuous updating/evolution of the basic parts. As far as performance is concerned, the PC architecture is general-purpose which means that, in principle, it cannot attain the same performance level as custom, high-end network devices, which often use dedicated HW elements to handle and to parallelize the most critical operations. Otherwise, the performance gap might not be so large and, in any case, more than justified by the cost differences. Our activities, carried out within the framework of the BORA-BORA project [1], are geared to facilitate the investigation by reporting the results of an extensive optimization and testing operation carried out on OR architecture based on Linux software. We focused our attention mainly on packet forwarding functionalities. Our main objectives were the performance evaluation of an optimized OR, in addition to external (throughput and latencies) and internal (profiling) measurements. To this regard, we identified a high-end reference PC-based hardware architecture and Linux kernel 2.6 for the software data plane. Subsequently, we optimized this OR structure, defined a test environment and finally developed a complete series of tests with an accurate evaluation of the software module s role in defining performance limits. With regard to the state-of-the-art of OR devices, some initiatives have been undertaken over the last few years to develop and investigate the ORs and related topics. In the software area, one of the most important initiatives is the Modular Router Project [2], which proposes an effective data plane solution. In the control plane area two important projects can be cited: Zebra [3] and Xorp [4]. Despite custom developments, some standard Open Source OSs can also provide very effective support for an OR project. The most relevant OSs in this sense are Linux [5][6] and FreeBSD [7]. Other activities focus on hardware: [8] and [9] propose a router architecture based on a PC cluster, while [1] reports some performance results (in packet transmission and reception) obtained with a PC Linux-based testbed. Some evaluations have also been carried out on network boards (see, for example, [11]). Other fascinating projects involving Linux-based ORs can be found in [12] and [13], where Bianco et al. report some interesting performance results. In [14] a performance analysis of an OR architecture enhanced with FPGA line cards, which allows direct NIC-to-NIC packet forwarding, is introduced. [15] describes the Intel 27 ACADEMY PUBLISHER

2 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 7 I/OAT, a technology that enables DMA engines to improve network reception and transmission by offloading the CPU of some low-level operations. In [16] the virtualization of a multiservice OR architecture is discussed: the authors propose multiple forwarding chains virtualized with Xen. Finally, in [17], we proposed an in-depth study of the IP lookup mechanism included in the Linux kernel. The paper is organized as follows. the hardware and software details of the proposed OR architecture are reported in sections II and III reports, while Section IV contains a description of performance tuning and optimization techniques. The benchmarking scenario and the performance results are reported in Sections V and VI, respectively. Conclusions are presented in Section VII. II. LINUX OR SOFTWARE ARCHITECTURE The OR architecture has to provide many different types of functionalities: from those directly involved in the packet forwarding process to the ones needed for control functionalities, dynamic configuration and monitoring. As outlined in [5], in [18] and in [19], all the forwarding functions are developed inside the Linux kernel, while most of the control and monitoring operations (the signaling protocols such as routing protocols, control protocols, etc.) are daemons / applications running in the user mode. Like the older kernel versions, the Linux networking architecture is basically based on an interrupt mechanism: network boards signal the kernel upon packet reception or transmission through HW interrupts. Each HW interrupt is served as soon as possible by a handling routine, which suspends the operations currently being processed by the CPU. Until completed, the runtime cannot be interrupted by anything, and not even by other interrupt handlers. Thus, with the clear purpose of making the system reactive, the interrupt handlers are designed to be very short, while all the time-consuming tasks are performed by the so-called Software Interrupts (SoftIRQs) afterwards. This is the well-known top half bottom half IRQ routine division implemented in the Linux kernel [18]. SoftIRQs are actually a form of kernel activity that can be scheduled for later execution rather than real interrupts. They differ from HW IRQs mainly in that a SoftIRQ is scheduled for execution by a kernel activity, such as an HW IRQ routine, and has to wait until it is called by the scheduler. SoftIRQs can be interrupted only by HW IRQ routines. The NET_TX_SOFTIRQ and the NET_RX_ SOFTIRQ are two of the most important SortIRQs in the Linux kernel and the backbone of the entire networking architecture, since they are designed to manage the packet transmission and reception operations, respectively. In detail, the forwarding process is triggered by an HW IRQ generated from a network device, which signals the reception or the transmission of packets. Then the corresponding routine performs some fast checks, and schedules the correct SoftIRQ, which is activated by the kernel scheduler as soon as possible. When the SoftIRQ is finally executed, it performs all the packet forwarding operations. As shown in Figure 1, which reports a scheme of Linux source code involved in the forwarding process, these operations computed during SoftIRQs can be organized in a chain of three different modules: a reception API that handles packet reception (NAPI 1 ), a module that carries out the IP layer elaboration and, finally, a transmission API that manages the forwarding operations to the egress network interfaces. In particular, the reception and the transmission APIs are the lowest level modules, and are activated by both HW IRQ routines and scheduled SoftIRQs. They handle the network interfaces and perform some layer 2 functionalities. The NAPI [2] was introduced in the kernel version, and has been explicitly created to increase reception process scalability. It handles network interface requests with an interrupt moderation mechanism, through it is possible to adaptively switch from a classical to a polling interrupt management of the network interfaces. In greater detail, this is accomplished by inserting the identifier of the board generating the IRQ on a special list, called the poll list, during the HW IRQ routine, scheduling a reception SoftIRQ, and disabling the HW IRQs for that device. When the SoftIRQ is activated, the kernel polls all the devices, whose identifier is included on the poll list, and a maximum of quota packets are served per device. If the board buffer (Rx Ring) is emptied, then the identifier is removed from the poll list and its HW IRQs re-enabled. Otherwise, its HW IRQs is left disabled, the identifier remains on the poll list and another SoftIRQ is scheduled. While this mechanism behaves like a pure interrupt mechanism in the presence of a low ingress rate (i.e., we have more or less one HW IRQ per packet), when traffic increases, the probability of emptying the RxRing, and thus re-enabling HW IRQs, decreases more and more, and the NAPI starts working like a polling mechanism. For each packet received during the NAPI processing a descriptor, called skbuff [21], is immediately allocated. In particular, as shown in Figure 1, to avoid unnecessary and tedious memory transfer operations, the packets are allowed to reside in the memory locations used by the DMA-engines of ingress network interfaces, and each subsequent operation is performed using the skbuffs. These descriptors do in fact consist of pointers to the different key fields of the headers contained in the associated packets, and are used for all the layer 2 and 3 operations. A packet is elaborated in the same NET_RX SoftIRQ, until it is enqueued in an egress device buffer, called Qdisc. Each time a NET_TX SoftIRQ is activated or a new packet is enqueued, the Qdisc buffer is served. When 1 In greater detail, the NAPI architecture includes a part of the interrupt handler. 27 ACADEMY PUBLISHER

3 8 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 ip_rcv ip_rcv_finish ip_forward ip_forward_finish ip_send Netfilter hook ip_route_input ip_output rt_hash_code IP Processing ip_finish_output Root Qdisc device 2 Poll_Queue CPU1 Device 1 Device 2 Device 3 dev_queue_xmit qdisc_restart netif_receive_skb eth_header e1_alloc_rx_buffers eth_type_trans hard_start_xmit e1_clean_rx_irq alloc_skb e1_clean_tx_irq e1_xmit_frame Completion queue Tx_Ring device 2 NAPI Rx_Ring device 3 Kernel Memory kfree TX-API DMA engines net_rx_action interrupt handler net_tx_action HW Interrupt Figure 1. Detailed scheme of forwarding code in 2.6 Linux kernel versions. a packet is dequeued from the Qdisc buffer, it is placed on the Tx Ring of the egress device. After the board successfully transmits one or more packets, it generates an HW IRQ, whose routine schedules a NET_TX SoftIRQ. The Tx Ring is periodically cleaned of all the descriptors of transmitted packets, which will be deallocated and refilled by the packets coming from the Qdisc buffer. Another interesting characteristic of the 2.6 kernels (introduced to reduce performance deterioration due to CPU concurrency) is the Symmetric Multi-Processors (SMP) support that may assign management of each network interface to a single CPU for both the transmission and reception functionalities. III. HARDWARE ARCHITECTURE The Linux OS supports many different hardware architectures, but only a small portion of them can be effectively used to obtain high OR performance. In particular, we must take into account that, during networking operations, the PC internal data path has to use a centralized I/O structure consisting of the I/O bus, the memory channel (used by DMA to transfer data from network interfaces to RAM and vice versa) and the Front Side Bus (FSB) (used by the CPU with the memory channel to access the RAM during packet elaboration). The selection criterions for hardware elements have been very fast internal busses, RAM with very low access times, and CPUs with high integer computational power (i.e., packet processing does not generally require any floating point operations). In order to understand how hardware architecture affects overall system performance, we selected two different architectures that represent the current state-ofthe art of server architectures and the state-of-the-art from 3 years ago, respectively. To this regard, as old HW architecture, we chose a system based on the Supermicro X5DL8-GG mainboard: it can support a dual-xeon system with a dual memory channel and a PCI-X bus at 133MHz with 64 parallel bits. The Xeon processors (32 bit and mono-core) we utilized have a 2.4 GHz clock and a 512KB cache. For the new OR architecture we used a Supermicro X7DBE mainboard, equipped with both the PCI Express and PCI- X busses, and with a 55 Intel Xeon (dual core 64-bit processor). Network interfaces are another critical element, since they can heavily affect PC Router performance. As reported in [11], the network adapters on the market offer different performance levels and configurability. With this in mind, we selected two different types of adapters with different features and speed: a high performance and configurable Gigabit Ethernet interface, namely Intel PRO 1, which is equipped with a PCI-X controller (XT version) or a PCI-Express (PT version) [22]; a D- Link DFE-58TX [23] that is a network card equipped with four Fast Ethernet interfaces and a PCI 2.1 controller. IV. SOFTWARE PERFORMANCE TUNING The entire networking Linux kernel architecture is quite complex and has numerous aspects and parameters that can be tuned for system optimization. In particular, in this environment, since the OS has been developed to act 27 ACADEMY PUBLISHER

4 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 9 as network host (i.e., workstation, server, etc.), it is natively tuned for general purpose network end-node usage. In this last case, packets are not fully processed inside kernel-space, but are usually delivered from network interfaces to applications in user-space, and vice versa. When the Linux kernel is used in an OR architecture, it generally works in a different manner, and should be specifically tuned and customized to obtain the maximum packet forwarding performance. As reported in [19] and [25], where a more detailed description of the adopted tuning actions can be found, this optimization is very important for obtaining maximum performance. Some of the optimal parameter values can be identified by logical considerations, but most of them have to be empirically determined, since their optimal value cannot be easily derived from the software structure and because they also depend on the hardware components. So we carried out our tuning first by identifying the critical elements on which to operate, and, then, by finding the most convenient values with both logical considerations and experimental measures. As far as the adopted tuning settings are concerned, we used the e1 driver [24], configured with both the Rx and Tx ring buffers to 256 descriptors, while the Rx interrupt generation was not limited. The qdisc size for all the adapters was dimensioned to 2, descriptors, while the scheduler clock frequency was fixed to 1Hz. Moreover the.13 kernel images used to obtain the numerical results in Section VI include two structural patches that we created to test and/or optimize kernel functionalities. In particular, those patches are described in the following discussion. A. Skbuff Recycling patch We studied and developed a new version of the skbuff Recycling patch, originally proposed by R. Olsson [26] for the e1 driver. In particular, the new version is stabilized for the.13 kernel version and extended to the sundance driver. This patch intercepts the skbuff descriptors of transmitted packets before they are de-allocated, and reuses them for new incoming packets. As shown in [19], this architectural change significantly reduces the computation weight of the memory management operations, thus attaining a very high performance level (i.e., about % of the maximum throughput of standard kernels). B. Performance Counter patch To further analyze the OR s internal behavior, we decided to introduce a set of counters in the kernel source code in order to understand how many times a certain procedure is called, or how many packets are kept per time. Specifically, we introduced the following counters: IRQ: number of interrupt handlers generated by a network card; Tx/Rx IRQ: number of tx/rx IRQ routines per device; Tx/Rx SoftIRQ: number of tx/rx software IRQ routines; Qdiscrun and Qdiscpkt: number of times the output buffer (Qdisc) is served, and number of served packets per time. Pollrun and Pollpkt: number of times the rx ring of a device is served, and the number of served packets per time. tx/rx clean: number of times the tx/rx procedures of the driver are activated. The values of all these parameters have been mapped in the Linux proc file system. V. BENCHMARKING SCENARIO To benchmark the OR forwarding performance, we used a professional device, known as Agilent N2X Router Tester [27], which can be used to obtain throughput and latency measurements with high availability and accuracy levels (i.e., the minimum guaranteed timestamp resolution is 1 ns). Moreover, with two dual Gigabit Ethernet cards and one 16 Fast Ethernet card, we can analyze the OR behavior with a large number of Fast and Gigabit Ethernet interfaces. To better support the performance analysis and to identify the OR bottlenecks, we also performed some internal measurements using specific software tools (called profilers) placed inside the OR which trace the percentage of CPU utilization for each software module running on the node. The problem is that with many of these profilers the relevant computational effort required perturbs system performance, thus generating what are not very meaningful the results. We verified with many different tests that one of the best is Oprofile [28], an open source tool that continuously monitors system dynamics with frequent and quite regular sampling of CPU hardware registers. Oprofile effectively and profoundly evaluate CPU utilization of each software application and each single kernel function running in the system with very low computational overhead. With regard to the benchmarking scenario, we decided to start by defining a reasonable set of test setups (with increasing level of complexity) and for each selected setup to apply some of the tests defined in the RFC 2544 [29]. In particular, we chose to perform these activities by using both a core and an edge router configuration: the former consists of a few high-speed (Gigabit Ethernet) network interfaces, while the latter utilizes a high-speed gateway interface and a large number of Fast Ethernet cards which collect traffic from the access networks. More specifically, we performed our tests by using the following setups (see Figure 2): 1) Setup A: a single mono directional flow crosses the OR from one Gigabit port to another one; 2) Setup B: two full duplex flows cross the OR, each one using a different pair of Gigabit ports; 3) Setup C: a full-meshed (and full-duplex) traffic matrix applied on 4 Gigabit Ethernet ports; 4) Setup D: a full-meshed (and full-duplex) traffic matrix applied on 1 Gigabit Ethernet port and 12 Fast Ethernet interfaces. In greater detail, each OR forwarding benchmarking session essentially consists of three test sets, and namely: 27 ACADEMY PUBLISHER

5 1 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 a) Throughput and latency: this test set is performed by using constant bit rate traffic flows, consisting of fixed size datagrams, to obtain: a) the maximum effective throughput (in Kpackets/s and as a percentage with respect to the theoretical value) versus different IP datagram sizes; b) the average, maximum and minimum latencies versus different IP datagram sizes; b) Back-to-back: these tests are carried out by using burst traffic flows and by changing both the burst dimension (i.e., the number of the packets comprising the burst) and the datagram size. The main results for this kind of test are: a) zero loss burst length versus different IP datagram sizes; b) average, maximum and minimum latencies versus different sizes of IP datagram comprising the burst ( zero loss burst length is the maximum number of packets transmitted with minimum inter-frame gaps that the System Under Test (SUT) can handle without any loss). c) Loss Rate: this kind of test is carried out by using CBR traffic flows with different offered loads and IP datagram sizes; the obtainable results can be summarized in throughput versus both offered load and IP datagram sizes. Note that all these tests have been performed by using different IP datagram sizes (i.e., 4, 64, 128, 256, 512, 124 and 15 bytes) and both CBR and burst traffic flows. Setup A Setup B Setup C Setup D Figure 2. Benchmarking setups. VI. NUMERICAL RESULTS A selection of the experimental results is reported in this section. In particular, the results of the benchmarking setups shown in Figure 2 are reported in Subsections A, B, C and D. In all such cases, the tests were performed with the old hardware architecture described in Section III (i.e., 32-bit Xeon and PCI-X bus). With regard to Software architecture, we decided to compare different Linux kernel configurations and a Modular Router. In particular, we used the following versions of the Linux kernel: single-processor optimized kernel (a version based on the standard one with single processor support that includes the descriptor recycling patch). dual-processor standard kernel (a standard NAPI kernel version similar to the previous one but with SMP support); Note that we decided not take into account the SMP versions of both the optimized Linux kernel and the Modular Router, since they lack a minimum acceptable level of stability. Subsection E summarizes the results obtained in the previous tests by showing the maximum performance for each benchmarking setup. Finally, the performance of the two hardware architectures described in Section III are reported in Subsection F, in order to evaluate how HW evolution affects forwarding performance. A. Setup A numerical results In the first benchmarking session, we performed the RFC 2544 tests by using setup A (see Figure 2) with both the single-processor optimized kernel and. As we can observe in Figs. 3, 4 and 5, which report the numerical results of the throughput and latency tests, both software architectures cannot achieve the maximum theoretical throughput in the presence of small datagram sizes. As demonstrated by the profiling measurements reported in Fig. 6, obtained with the single processor optimized kernel and with 64 Bytes sized datagrams, this effect is clearly caused by the computational CPU capacity that limits the maximum forwarding rate of the Linux kernel to about 7 Kpackets/s (4% of the full Gigabit speed). In fact, even if the CPU idle goes to zero at 4% of full load, the CPU occupancies of all the most important function sets appear to adapt their contributions up to 7 Kpackets/s; after this point their percentage contributions to CPU utilization remains almost constant Figure 3. Throughput and latencies test, testbed A: effective throughput results for the single-processor optimized kernel and. 4 min 35 max 3 min max Figure 4. Throughput and latencies test, testbed A: minimum and maximum latencies for both the single-processor optimized kernel and. More expressly, Fig. 5 shows that the computational weight of memory management operations (like sk_buff allocations and de-allocations) is substantially limited, thanks to the descriptor recycling patch, to less than 25%. In other our works, such as [19], we have shown that this patch can be used to save a CPU time share equal to about 2%. 27 ACADEMY PUBLISHER

6 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE avg avg Figure 5. Throughput and latencies test, testbed A: average latencies for both the single-processor optimized kernel and. CPU Utilization [%] idle scheduler memory IP processing NAPI Tx API IRQ Eth processing oprofile Figure 6. Profiling results of the optimized Linux kernel obtained with testbed setup A. Occurrences [# / Pkts] Occurrences [# / Pkts] IRQ Poll Run rxsoftirq Figure 7. Number of IRQ routines, polls and Rx SoftIRQ (second y- axis) for the RX board for the skbuff recycling patched kernel, in the presence of an incoming traffic flow with only 1 IP source address. Occurrences [# / Pkts] Occurrences [# / Pkts] IRQ Wake Func Figure 8. Number of IRQ routines for the TX board, of Tx Ring cleaned by TxSoftIRQ ( func ) and by RxSoftIRQ ( wake ) for the skbuff recycling patched kernel, in the presence of an incoming traffic flow with only 1 IP source address. The second y axis refers to wake. The behavior of the IRQ management operations would appear to be rather strange: in fact, their CPU utilization level decreases with an increase in input rate. There are mainly two reasons for such a behavior related to the packet grouping effect in the Tx and in the RxAPI: in particular, when the ingress packet rate rises, NAPI tends to moderate the IRQ rate by causing it to operate more like a polling than an interrupt-like mechanism (and thus we have the first interrupt number reduction), while TxAPI, under the same conditions, can better exploit the packet grouping mechanism by sending more packets at time (and then the number of interrupts for successful transmission confirmations decreases). When the IRQ weight becomes zero, the OR reaches the saturation point, and operates like a polling mechanism. With regard to all the other operation sets (i.e., IP and Ethernet processing, NAPI and TxAPI), their behaviour is clearly bound by the number of forwarded packets: the weight of almost all the classes increases linearly up to the saturation point, and subsequently remains more or less constant. This analysis is confirmed also by the performance counters reported in Figs. 7 and 8, in which both the Tx and Rx boards reduce their IRQ generation rates, while the kernel passes from polling the Rx Ring twice per received packet, to about.22 times. The number of Rx SoftIRQ per received packet also decreases as offered traffic load rises. For what concerns transmission dynamics, Fig. 8 shows very low function occurrences: in fact, the Tx IRQ routines decrease their occurrences up to saturation, while the wake function, which represents the number of times that the Tx Ring is cleaned and the Qdisc buffer is served during an Rx SoftIRQ, exhibits a mirror-like behavior: this occurs because when the OR reaches the saturation, all the tx functionalities are activated when the Rx SoftIRQ starts. Burst Length [pkt] Figure 9. Back-to-back test, testbed A: maximum zero loss burst lengths. TABLE I. BACK-TO-BACK TEST, TESTBED A: LATENCY VALUES FOR BOTH THE SINGLE-PROCESSOR OPTIMIZED KERNEL AND CLICK. optimized Kernel PktLength Latency Latency [Byte] Min Average Max Min Average Max [us] [us] [us] [us] [us] [us] Similar considerations can also be made for the modular router: the performance limitations in the presence of short-sized datagrams continue to be caused by a computational bottleneck, but the simple packet receive API based on the polling mechanism improves throughput performance by lowering the weight of IRQ management and RxAPI functions. For the same reasons, as shown in Figs. 4 and 5, the receive mechanism included in introduces higher packet latencies. According to the previous results, the back-toback tests, as reported in Fig. 9 and Table I, also demonstrate that the optimized Linux kernel and continue to be affected by small-sized datagrams. In fact, while when using 256 Byte or higher sized datagrams the measured zero-loss burst length is quite 27 ACADEMY PUBLISHER

7 12 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 close to the maximum burst length used in the tests carried out, it appears to be heavily limited in the presence of 4, 64 and, only for what concerns the Linux kernel, 128 Byte-sized packets. Exception is made for the single 128-Byte case, in which the computational bottleneck starts to affect NAPI while the forwarding rate continues to be very close to the theoretical one. The Linux kernel provides a better support for burst traffic than. As a result, zero-loss burst lengths are longer and associated latency times are smaller. The loss rate test results are reported in Fig B 64B 2 128B 4B 64B Figure 1. Loss Rate test, testbed A: maximum throughput. B. Setup B numerical results In the second benchmarking session we analyzed the performance achieved by the optimized single processor Linux kernel, the SMP standard Linux kernel and the modular router with testbed setup B (see Fig. 2). Fig. 11 reports the maximum effective throughput in terms of forwarded packets per second for a single router interface. From this figure it is clear that, in the presence of short-sized packets, the performance level of all three software architectures is not close to the theoretical one. More specifically, while the best throughput values are achieved by, the SMP kernel seems to provide better forwarding rates with respect to the optimized kernel. In fact, as outlined in [25], if no explicit CPUinterface bounds are present, the SMP kernel processes the received packets (using, if possible, the same CPU for the entire packet elaboration) and attempts to dynamically distribute the computational load among the CPUs SMP Figure 11. Throughput and latencies test, testbed setup B: effective throughput. Thus, in this particular setup, computational load sharing attempts to manage the two interfaces, to which a traffic pair is applied, with a single fixed CPU, fully processing each received packet with only one CPU, thus avoiding any memory concurrency problems. Figs. 12 and 13 report the minimum, the average and the maximum latency values according to different datagram sizes obtained for all three software architectures. In particular, we note that both Linux kernels, which in this case provide very similar results, ensure minimum latencies lower than. Instead, provides better average and maximum latency values for short-sized datagrams min max min max SMP min SMP max Figure 12. Throughput and latencies test, testbed B: minimum and maximum latencies avg avg SMP avg Figure 13. Throughput and latencies test, testbed B: average latencies. 8 7 SMP Figure 14. Back-to-back test, testbed B: maximum zero-loss burst lengths. Burst Length [pkt] The back-to-back results, reported in Fig. 14 and Table II, show that the performance level of all analyzed architectures is nearly comparable in terms of zero-loss burst length, while as far as latencies are concerned, the Linux kernels provide better values. By analyzing Fig. 15, which reports the loss rate results, we note how the performance values obtained with and the SMP kernel are better, especially for low-sized datagrams, than the one obtained by the optimized single processor kernel. Moreover, Fig. 15 also shows that all three OR software architectures do not achieve the full Gigabit/s speeds even for large datagrams, with a maximum forwarding rate of about 65 Mbps per interface. To improve the readability of these results, we reported in Fig. 15 and in all the following loss rate tests only the OR behavior with the minimum and maximum datagram sizes since they are, respectively, the performance lower and upper bound. 27 ACADEMY PUBLISHER

8 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE TABLE II. BACK-TO-BACK TEST, TESTBED B: LATENCY VALUES FOR ALL THREE SOFTWARE ARCHITECTURES. Optimized Kernel SMP Kernel Pkt Length [Byte] Min Average Max Min Average Max Min Average Max B 15B 4B 15B SMP 4B SMP 15B Figure 15. Loss Rate test, testbed B: maximum throughput versus both offered load and IP datagram sizes. C. Setup C numerical results In this benchmarking session, the three software architectures were tested in the presence of four Gigabit Ethernet interfaces with a full-meshed traffic matrix (Fig. 2). By analyzing the maximum effective throughput values in Fig. 16, we note that appears to achieve a better performance level with respect to the Linux kernels while, unlike the previous case, the single processor kernel provides maximum forwarding rates larger than the SMP version with small packets. In fact, the SMP kernel tries to share the computational load of the incoming traffic among the CPUs, resulting in an almost static assignment of each CPU to two specific network interfaces. Since, in the presence of a fullmeshed traffic matrix, about half of the forwarded packets cross the OR between two interfaces managed by different CPUs, this decreases performance due to memory concurrency problems [19]. Figs. 17 and 18 show the minimum, the maximum and the average latency values obtained during this test set. In observing the last results, we note how the SMP kernel, in the presence of short-sized datagrams, continues to undergo memory concurrency problems which lowers OR performance while considerably increasing both the average and the maximum latency values. By analyzing Fig. 19 and Table III, which report the back-to-back test results, we note that all three OR architectures achieve a similar zero-loss burst length, while reaches very high average and maximum latencies with respect to the single-processor and SMP kernels when small packets are used. The loss-rate results in Fig. 2 highlight the performance decay of the SMP kernel, while a fairly similar behavior is achieved by the other two architectures. Moreover, as in the previous benchmarking session, the maximum forwarding rate for each Gigabit network interface is limited to about 6/65 Mbps SMP Figure 16. Throughput and latencies test, setup C: effective throughput results min max min max SMP min SMP max Figure 17. Throughput and latencies test, testbed C: minimum and maximum latencies avg avg SMP avg Figure 18. Throughput and latencies test, results for testbed C: average latencies SMP Figure 19. Back-to-back test, testbed C: maximum zero loss burst lengths Burst Length [pkt] 4B 15B 4B 15B SMP 4B SMP 15B Figure 2. Loss Rate test, testbed C: maximum throughput versus both offered load and IP datagram sizes 27 ACADEMY PUBLISHER

9 14 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 TABLE III. BACK-TO-BACK TEST, TESTBED C: LATENCY VALUES FOR THE SINGLE-PROCESSOR OPTIMIZED KERNEL, THE CLICK MODULAR ROUTER AND THE SMP KERNEL optimized Kernel SMP Kernel Pkt Length [Byte] Min Average Max Min Average Max Min Average Max D. Setup D numerical results In the last benchmarking session, we applied setup D, which provides a full-meshed traffic matrix between one Gigabit Ethernet and 12 Fast Ethernet interfaces, to the single-processor Linux kernel and to the SMP version. We did not use in this last test since, at the moment and for this software architecture, there are no drivers with polling support for the D-Link interfaces. By analyzing the throughput and latency results in Figs. 21, 22 and 23, we note how, in the presence of a high number of interfaces and a full-meshed traffic matrix, the performance of the SMP kernel version drops significantly: the maximum measured value for the effective throughput is limited to about 24 packets/s and the corresponding latencies would appear to be much higher with respect to those obtained with the single processor kernel. However, the single processor kernel also does not support the maximum theoretical rate: it achieves 1% of full speed in the presence of short-sized datagrams and about 75% for high datagram sizes. 8 7 SMP Figure 21. Throughput and latencies test, setup D: effective throughput results for both Linux kernels min max SMP min SMP max Figure 22. Throughput and latencies test, results for testbed D: minimum and maximum latencies for both Linux kernels. To better understand why the OR does not attain fullspeed with such a high number of interfaces, we decided to perform several profiling tests. In particular, these tests were carried out using two simple traffic matrices: the first (Fig. 24) consists of 12 CBR flows that cross the OR from the Fast Ethernet interfaces to the Gigabit one, while the second (Fig. 25) still consists of 12 CBR flows that cross the OR in the opposite direction (e.g., from the Gigabit to the Fast Ethernet interfaces). These simple traffic matrices allow us to separately analyze the reception and transmission operations avg SMP avg Figure 23. Throughput and latencies test, testbed D: average latencies for both the Linux kernels. CPU Percentage [%] Offered Load [Kpackets/s] idle scheduler memory IP processing NAPI Tx API IRQ Eth processing oprofile control Figure 24. Profiling results obtained by using 12 CBR flows that cross the OR from the Fast Ethernet interfaces to the Gigabit one. CPU Percentage [%] Offered Load [Kpackets/s] idle scheduler memory IP processing NAPI Tx API IRQ Eth processing oprofile control Figure 25. Profiling results obtained by using 12 CBR flows that cross the OR from a Gigabit interface to 12 FastEthernet ones. Thus, Figs. 24 and 25 report the profiling results corresponding to the two traffic matrices. The internal measurements shown in Fig. 24 highlight that fact that the CPUs are overloaded by the very high computational load of the IRQ and TX API management operations. This is due to the fact that during the transmission process each interface must signal the state of both the transmitting packets and the transmission ring to the associated driver instance through interrupts. More specifically, and again referring to Fig. 24, we note that IRQ CPU occupancy decreases by up to 3% of the offered load, and afterwards, while the OR reaches saturation, it remains constantly at about 5% of the computational resources. The initial decreasing behavior is due to the fact that by increasing the offered load traffic, the OR can better exploit packet grouping effects. Instead, the constant behavior is due to the fact that the OR manages the same packet quantity. Referring to Fig. 27 ACADEMY PUBLISHER

10 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE , we note how the presence of traffic incoming from many interfaces increases the computational weights of both the IRQ and the memory management operations. The decreasing behavior of the IRQ management computational weight is not due, as in the previous case, to the packet grouping effect, but to the typical NAPI structure that passes from an IRQ based mechanism to a polling one. The high memory management values can be explained quite simply by the fact that the recycling patch is not operating with the Fast Ethernet driver. Burst Length [pkt] SMP Figure 26. Back-to-back test, testbed D: maximum zero loss burst lengths. TABLE IV. BACK-TO-BACK TEST, TESTBED D: LATENCY VALUES FOR THE SINGLE-PROCESSOR OPTIMIZED KERNEL AND THE SMP KERNEL. Optimized Kernel SMP Kernel Pkt Length [Byte] Min Average Max Min Average Max B 15B 8 SMP 4B SMP 15B Figure 27. Loss Rate test, testbed D: maximum throughput versus both offered load and IP datagram sizes. The back-to-back results, reported in Fig. 26 and Table IV, show a very particular behavior: in fact, even if the single processor kernel can achieve longer zero-loss burst lengths than the SMP kernel, the latter appears to ensure lower minimum, average and maximum latency values. In the end, Fig. 27 reports the loss rate test results, which, compatible with the previous results, show that a single processor kernel can sustain a higher forwarding throughput than the SMP version. E. Maximum Performance In order to effectively synthesize and improve the evaluation of the proposed performance results, we report in Figs. 28 and 29 the aggregated 2 maximum values for each testbed of, respectively, the effective throughput and the maximum throughput (obtained in the loss rate test). By analyzing Fig. 28, we note that in the presence of more network interfaces, the OR generates values higher than 1 Gbps and, in particular, that it reaches maximum values equal to 1.6 Gbps with testbed D. We can also point out that the maximum effective throughput of setups B and C are almost the same: in fact, these very similar testbeds have only one difference (i.e., the traffic matrix), which has an effect only on the performance level of the SMP kernel, but practically no effect on the behaviors of the single processor kernel and. Effective Throughput [Mbps] 18 setup A 16 setup B 14 setup C setup D Figure 28. Maximum effective throughput values obtained in the implemented testbeds. Throughput [Mbps] setup A setup B setup C setup D Figure 29. Maximum throughput values obtained in the implemented testbeds. The aggregated maximum throughput values, as reported in Fig. 29, are obviously higher than the ones in Fig. 28. This highlights the fact that the maximum forwarding rates sustainable by the OR are achieved in setups B and C with 2.5 Gbps. Moreover, while in setup A the maximum theoretical rate is achieved for packet sizes larger than 128, in all the other setups the maximum throughput values are not much higher than half the theoretical ones. F. Hardware Architecture Impact In the final benchmarking session, we decided to compare the performance of the two hardware architectures introduced in Section III, which represent the current and the state-of-the-art of server architectures four years ago. The benchmarking scenario is the one used in testbed A (with reference to Fig. 2), while the selected software architecture is the single processor optimized kernel. 2 In this case, aggregated refers to the sum of the forwarding rates of all the OR network interfaces. 27 ACADEMY PUBLISHER

11 16 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE 27 It is clear that the purposes of these tests was to understand how the continuous evolution of COTS hardware affects overall OR performance. Therefore, Figs. 3, 31 and 32 report the results of effective throughput tests for the old architecture (i.e., 32-bit Xeon) and the new one (i.e., 64-bit Xeon) equipped with both PCI-X and PCI-Express busses. The loss rate results are shown in Fig Old PCI-Ex PCI-X 1 1 Figure 3. Throughput and latencies test, setup A with the old HW architecture and the new one equipped with PCI-X and PCI-Express busses: effective throughput results for the single processor optimized kernel. Note that the x-axis is in the logarithmic scale Min Old Max Old Min PCI-Ex Max PCI-Ex Min PCI-X Max PCI-X Figure 31. Throughput and latencies test, results for testbed A with the old HW architecture and the new one equipped with PCI-X and PCI-Express busses: minimum and maximum latencies for the single processor optimized kernel Avg Old Avg PCI-Ex Avg PCI-X Figure 32. Throughput and latencies test, results for testbed A with the old HW architecture and the new one equipped with PCI-X and PCI-Express busses: average latencies for the single processor optimized kernel. By observing the comparisons in Figs. 3 and 31, it is clear that the new architecture generally provides better performance values than the old one: more specifically, while using the new architecture with the PCI-X bus slightly improves performance, when the PCI-Express is used the OR effective throughput is an impressive 88% with 4 Byte-sized packets, achieving the maximum theoretical rate for all other packet sizes. All this is clearly due to the high efficiency of the PCI Express bus. In fact, with this I/O bus DMA transfers occur with a very low control overhead (since it behaves like a leased line), which probably leads to less heavy accesses to the RAM and, subsequently, to benefits in terms of memory accesses by the CPU. In other words, this high performance enhancement is caused by a more effective memory access of the CPU, thanks to the features of the PCI Express DMA Old PCI-Ex PCI-X Figure 33. Loss Rate test, testbed A for the old HW architecture and the new one equipped with PCI-X and PCI-Express busses: maximum throughput versus both offered load and IP datagram sizes. VII. CONCLUSIONS In this contribution we report the results of the in-depth optimization and testing carried out on PC Open Router architecture based on Linux software and, more specifically, based on the Linux kernel. We have presented a performance evaluation in some common working environments of three different data plane architectures, including the optimized Linux 2.6 kernel, the Modular Router and the SMP Linux 2.6 kernel, with external (throughput and latencies) and internal (profiling) measurements. External measurements were performed in an RFC2544 [29] compliant manner by using professional devices [27]. Two hardware architectures were tested and compared for the purpose of understanding how the evolution in COTS hardware may affect performance. The experimental results show that the optimized version of the Linux kernel with suitable hardware architectures can achieve such high performance levels to effectively support several Gigabit interfaces. The results obtained show that the OR can achieve very interesting performance levels while attaining aggregated forwarding rate values of about 2.5 Gbps with relatively low latencies. REFERENCES [1] Building Open Router Architectures Based On Router Aggregation project (BORA-BORA), homepage at [2] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek, "The modular router", ACM Transactions on Computer Systems 18(3), Aug. 2, pp [3] Zebra, [4] M. Handley, O. Hodson, E. Kohler, XORP: an open platform for network research, ACM SIGCOMM Computer Communication Review, Vol. 33 Issue 1, Jan. 23, pp [5] S. Radhakrishnan, Linux - Advanced networking overview, [6] M. Rio et al., A map of the networking code in Linux kernel 2.4.2, Technical Report DataTAG-24-1, FP5/IST DataTAG Project, Mar. 24. [7] FreeBSD, 27 ACADEMY PUBLISHER

12 JOURNAL OF NETWORKS, VOL. 2, NO. 3, JUNE [8] B. Chen and R. Morris, "Flexible Control of Parallelism in a Multiprocessor PC Router", Proc. of the 21 USENIX Annual Technical Conference (USENIX '1), Boston, USA, June 21. [9] C. Duret, F. Rischette, J. Lattmann, V. Laspreses, P. Van Heuven, S. Van den Berghe, P. Demeester, High Router Flexibility and Performance by Combining Dedicated Lookup Hardware (IFT), off the Shelf Switches and Linux, Proc. of the 2 nd International IFIP-TC6 Networking Conference, Pisa, Italy, May 22, LNCS 2345, Ed E. Gregori et al, Springer-Verlag 22, pp [1] A. Barczyk, A. Carbone, J.P. Dufey, D. Galli, B. Jost, U. Marconi, N. Neufeld, G. Peco, V. Vagnoni, Reliability of datagram transmission on Gigabit Ethernet at full link load, LHCb technical note, LHCB 24-3 DAQ, Mar. 24. [11] P. Gray, A. Betz, Performance Evaluation of Copper- Based Gigabit Ethernet Interfaces, Proc. of 27 th Annual IEEE Conference on Local Computer Networks (LCN'2), Tampa, Florida, November 22, pp [12] A. Bianco, R. Birke, D. Bolognesi, J. M. Finochietto, G. Galante, M. Mellia, M.L.N.P.P. Prashant, Fabio Neri, vs. Linux: Two Efficient Open-Source IP Network Stacks for Software Routers, Proc. of the 25 IEEE Workshop on High Performance Switching and Routing (HPSR 25), Hong Kong, May 25, pp [13] A. Bianco, J. M. Finochietto, G. Galante, M. Mellia, F. Neri, Open-Source PC-Based Software Routers: a Viable Approach to High-Performance Packet Switching, Proc. of the 3 rd International Workshop on QoS in Multiservice IP Networks (QOS-IP 25), Catania, Italy, Feb. 25, pp [14] A. Bianco, R. Birke, G. Botto, M. Chiaberge, J. Finochietto, G. Galante, M. Mellia, F. Neri, M. Petracca, Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards, Proc. of the 26 IEEE Workshop on High Performance Switching and Routing (HPSR 26), Poznan, Poland, June 26, pp [15] A. Grover, C. Leech, Accelerating Network Receive Processing: Intel I/O Acceleration Technology, Proc. of the 25 Linux Symposium, Ottawa, Ontario, Canada, Jul. 25, vol. 1, pp [16] R. McIlroy, J. Sventek, Resource Virtualization of Network Routers, Proc. of the 26 IEEE Workshop on High Performance Switching and Routing (HPSR 26), Poznan, Poland, June 26, pp [17] R. Bolla, R. Bruschi, The IP Lookup Mechanism in a Linux Software Router: Performance Evaluation and Optimizations, Proc. of the 27 IEEE Workshop on High Performance Switching and Routing (HPSR 27), New York, USA. [18] K. Wehrle, F. Pählke, H. Ritter, D. Müller, M. Bechler, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel, Pearson Prentice Hall, Upper Saddle River, NJ, USA, 24. [19] R. Bolla, R. Bruschi, A high-end Linux based Open Router for IP QoS networks: tuning and performance analysis with internal (profiling) and external measurement tools of the packet forwarding capabilities, Proc. of the 3 rd International Workshop on Internet Performance, Simulation, Monitoring and Measurements (IPS MoMe 25), Warsaw, Poland, Mar. 25. [2] J. H. Salim, R. Olsson, A. Kuznetsov, Beyond Softnet, Proc. of the 5 th annual Linux Showcase & Conference, Nov. 21, Oakland, California, USA. [21] A. Cox, "Network Buffers and Memory Management" Linux Journal, Oct. 1996, lj issues/issue3/1312.html. [22] The Intel PRO 1 XT Server Adapter, xt.htm. [23] The D-Link DFE-58TX quad network adapter, E%2D58TX#. [24] J. A. Ronciak, J. Brandeburg, G. Venkatesan, M. Williams, Networking Driver Performance and Measurement e1 A Case Study, Proc. of the 25 Linux Symposium, Ottawa, Ontario, Canada, July 25, vol. 2, pp [25] R. Bolla, R. Bruschi, IP forwarding Performance Analysis in presence of Control Plane Functionalities in a PC-based Open Router, Proc. of the 25 Tyrrhenian International Workshop on Digital Communications (TIWDC 25), Sorrento, Italy, June 25, and in F. Davoli, S. Palazzo, S. Zappatore, Eds., Distributed Cooperative Laboratories: Networking, Instrumentation, and Measurements, Springer, Norwell, MA, 26, pp [26] The descriptor recycling patch, ftp://robur.slu.se/pub/ Linux/net-development/skb_recycling/. [27] The Agilent N2X Router Tester, comms.agilent.com/n2x/products/. [28] Oprofile, [29] Request for Comments 2544 (RFC 2544), org/rfcs/rfc2544.html. Raffaele Bolla was born in Savona (Italy) in He received his Master of Science degree in Electronic Engineering from the University of Genoa in 1989 and his Ph.D. degree in Telecommunications at the Department of Communications, Computer and Systems Science (DIST) in 1994 from the same university. From 1996 to 24 he worked as a researcher at DIST where, since 24, he has been an Associate Professor, and teaches a course in Telecommunication Networks and Telematics. His current research interests focus on resource allocation, Call Admission Control and routing in Multi-service IP networks, Multiple Access Control, resource allocation and routing in both cellular and ad hoc wireless networks. He has authored or coauthored over 1 scientific publications in international journals and conference proceedings. He has been the Principal Investigator in many projects in the Telecommunication Networks field. Roberto Bruschi was born in Genoa (Italy) in He received his Master of Science degree in Telecommunication Engineering in 22 from the University of Genoa and his Ph.D. in Electronic Engineering in 26 from the same university. He is presently working with the Telematics and Telecommunication Networks Lab (TNT) in the Department of Communication, Computer and System Sciences (DIST) at the University of Genoa. He is also a member of CNIT, the Italian inter-university Consortium for Telecommunications. Roberto is an active member of various Italian research projects in the networking area, such as BORA-BORA, FAMOUS, TANGO and EURO. He has co-authored over 1 papers in international conferences and journals. His main interests include Linux Software Router, Network processors, TCP and network modeling, VPN design, P2P modeling, bandwidth allocation, admission control and routing in multiservice QoS IP/MPLS networks. 27 ACADEMY PUBLISHER

Analyzing and Optimizing the Linux Networking Stack

Analyzing and Optimizing the Linux Networking Stack Analyzing and Optimizing the Linux Networking Stack Raffaele Bolla*, Roberto Bruschi*, Andrea Ranieri*, Gioele Traverso * Department of Communications, Computer and Systems Science (DIST) University of

More information

RFC 2544 Performance Evaluation for a Linux Based Open Router

RFC 2544 Performance Evaluation for a Linux Based Open Router RFC 2544 Performance Evaluation for a Linux Based Open Router Raffaele Bolla, Roberto Bruschi DIST - Department of Communications, Computer and Systems Science University of Genoa Via Opera Pia 13, 16145

More information

An Effective Forwarding Architecture for SMP Linux Routers

An Effective Forwarding Architecture for SMP Linux Routers An Effective Forwarding Architecture for SMP Linux Routers Raffaele Bolla, Roberto Bruschi Department of Communications, Computer and Systems Science (DIST), University of Genoa, Via Opera Pia 13, 16145

More information

PC-based Software Routers: High Performance and Application Service Support

PC-based Software Routers: High Performance and Application Service Support PC-based Software Routers: High Performance and Application Service Support Raffaele Bolla, Roberto Bruschi DIST, University of Genoa Via all Opera Pia 13, 16139, Genoa, Italy {raffaele.bolla, roberto.bruschi}@unige.it

More information

Open-Source PC-Based Software Routers: A Viable Approach to High-Performance Packet Switching

Open-Source PC-Based Software Routers: A Viable Approach to High-Performance Packet Switching Open-Source PC-Based Software Routers: A Viable Approach to High-Performance Packet Switching 353 Andrea Bianco 1, Jorge M. Finochietto 1, Giulio Galante 2, Marco Mellia 1, and Fabio Neri 1 1 Dipartimento

More information

Evaluation of Switching Performance of a Virtual Software Router

Evaluation of Switching Performance of a Virtual Software Router Evaluation of Switching Performance of a Virtual Roberto Rojas-Cessa, Khondaker M. Salehin, and Komlan Egoh Abstract s are an alternative low-cost and moderate-performance router solutions implemented

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Scalable Layer-2/Layer-3 Multistage Switching Architectures for Software Routers

Scalable Layer-2/Layer-3 Multistage Switching Architectures for Software Routers Scalable Layer-2/Layer-3 Multistage Switching Architectures for Software Routers Andrea Bianco, Jorge M. Finochietto, Giulio Galante, Marco Mellia, Davide Mazzucchi, Fabio Neri, Dipartimento di Elettronica,

More information

Performance of Software Switching

Performance of Software Switching Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance

More information

Collecting Packet Traces at High Speed

Collecting Packet Traces at High Speed Collecting Packet Traces at High Speed Gorka Aguirre Cascallana Universidad Pública de Navarra Depto. de Automatica y Computacion 31006 Pamplona, Spain aguirre.36047@e.unavarra.es Eduardo Magaña Lizarrondo

More information

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand

More information

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized

More information

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card In-Su Yoon 1, Sang-Hwa Chung 1, Ben Lee 2, and Hyuk-Chul Kwon 1 1 Pusan National University School of Electrical and Computer

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

Performance Evaluation of Linux Bridge

Performance Evaluation of Linux Bridge Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet

More information

Operating Systems Design 16. Networking: Sockets

Operating Systems Design 16. Networking: Sockets Operating Systems Design 16. Networking: Sockets Paul Krzyzanowski pxk@cs.rutgers.edu 1 Sockets IP lets us send data between machines TCP & UDP are transport layer protocols Contain port number to identify

More information

Gigabit Ethernet Packet Capture. User s Guide

Gigabit Ethernet Packet Capture. User s Guide Gigabit Ethernet Packet Capture User s Guide Copyrights Copyright 2008 CACE Technologies, Inc. All rights reserved. This document may not, in whole or part, be: copied; photocopied; reproduced; translated;

More information

Networking Driver Performance and Measurement - e1000 A Case Study

Networking Driver Performance and Measurement - e1000 A Case Study Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation john.ronciak@intel.com Ganesh Venkatesan Intel Corporation ganesh.venkatesan@intel.com Jesse Brandeburg

More information

OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin

OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment

More information

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan gilligan@vyatta.com

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan gilligan@vyatta.com Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan gilligan@vyatta.com Outline About Vyatta: Open source project, and software product Areas we re working on or interested in

More information

OpenFlow Switching: Data Plane Performance

OpenFlow Switching: Data Plane Performance This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 21 proceedings OpenFlow : Data Plane Performance Andrea Bianco,

More information

Open-source routing at 10Gb/s

Open-source routing at 10Gb/s Open-source routing at Gb/s Olof Hagsand, Robert Olsson and Bengt Gördén, Royal Institute of Technology (KTH), Sweden Email: {olofh, gorden}@kth.se Uppsala University, Uppsala, Sweden Email: robert.olsson@its.uu.se

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract

More information

How To Improve Performance On A Linux Based Router

How To Improve Performance On A Linux Based Router Linux Based Router Over 10GE LAN Cheng Cui, Chui-hui Chiu, and Lin Xue Department of Computer Science Louisiana State University, LA USA Abstract High speed routing with 10Gbps link speed is still very

More information

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links Filippo Costa on behalf of the ALICE DAQ group DATE software 2 DATE (ALICE Data Acquisition and Test Environment) ALICE is a

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components

More information

Lustre Networking BY PETER J. BRAAM

Lustre Networking BY PETER J. BRAAM Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

VMWARE WHITE PAPER 1

VMWARE WHITE PAPER 1 1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the

More information

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features UDC 621.395.31:681.3 High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features VTsuneo Katsuyama VAkira Hakata VMasafumi Katoh VAkira Takeyama (Manuscript received February 27, 2001)

More information

Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware

Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware Packet Capture in 1-Gigabit Ethernet Environments Using Contemporary Commodity Hardware Fabian Schneider Jörg Wallerich Anja Feldmann {fabian,joerg,anja}@net.t-labs.tu-berlin.de Technische Universtität

More information

Necessary Functions of a Multi-Stage Software Router

Necessary Functions of a Multi-Stage Software Router SNMP Management in a Distributed Software Router Architecture Andrea Bianco, Robert Birke, Fikru Getachew Debele, Luca Giraudo Dip. di Elettronica, Politecnico di Torino, Italy, Email: {last name}@tlc.polito.it

More information

High-performance vswitch of the user, by the user, for the user

High-performance vswitch of the user, by the user, for the user A bird in cloud High-performance vswitch of the user, by the user, for the user Yoshihiro Nakajima, Wataru Ishida, Tomonori Fujita, Takahashi Hirokazu, Tomoya Hibi, Hitoshi Matsutahi, Katsuhiro Shimano

More information

High-Speed TCP Performance Characterization under Various Operating Systems

High-Speed TCP Performance Characterization under Various Operating Systems High-Speed TCP Performance Characterization under Various Operating Systems Y. Iwanaga, K. Kumazoe, D. Cavendish, M.Tsuru and Y. Oie Kyushu Institute of Technology 68-4, Kawazu, Iizuka-shi, Fukuoka, 82-852,

More information

Wire-speed Packet Capture and Transmission

Wire-speed Packet Capture and Transmission Wire-speed Packet Capture and Transmission Luca Deri Packet Capture: Open Issues Monitoring low speed (100 Mbit) networks is already possible using commodity hardware and tools based on libpcap.

More information

Enabling Linux* Network Support of Hardware Multiqueue Devices

Enabling Linux* Network Support of Hardware Multiqueue Devices Enabling Linux* Network Support of Hardware Multiqueue Devices Zhu Yi Intel Corp. yi.zhu@intel.com Peter P. Waskiewicz, Jr. Intel Corp. peter.p.waskiewicz.jr@intel.com Abstract In the Linux kernel network

More information

TCP/IP Jumbo Frames Network Performance Evaluation on A Testbed Infrastructure

TCP/IP Jumbo Frames Network Performance Evaluation on A Testbed Infrastructure I.J. Wireless and Microwave Technologies, 2012, 6, 29-36 Published Online December 2012 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijwmt.2012.06.05 Available online at http://www.mecs-press.net/ijwmt

More information

Receive Descriptor Recycling for Small Packet High Speed Ethernet Traffic

Receive Descriptor Recycling for Small Packet High Speed Ethernet Traffic IEEE MELECON 2006, May 6-9, Benalmádena (Málaga), Spain Receive Descriptor Recycling for Small Packet High Speed Ethernet Traffic Cedric Walravens Department of Electrical Engineering - ESAT Katholieke

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Quantifying TCP Performance for IPv6 in Linux- Based Server Operating Systems

Quantifying TCP Performance for IPv6 in Linux- Based Server Operating Systems Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), November Edition, 2013 Volume 3, Issue 11 Quantifying TCP Performance for IPv6

More information

High-Density Network Flow Monitoring

High-Density Network Flow Monitoring Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible

More information

How To Test A Microsoft Vxworks Vx Works 2.2.2 (Vxworks) And Vxwork 2.4.2-2.4 (Vkworks) (Powerpc) (Vzworks)

How To Test A Microsoft Vxworks Vx Works 2.2.2 (Vxworks) And Vxwork 2.4.2-2.4 (Vkworks) (Powerpc) (Vzworks) DSS NETWORKS, INC. The Gigabit Experts GigMAC PMC/PMC-X and PCI/PCI-X Cards GigPMCX-Switch Cards GigPCI-Express Switch Cards GigCPCI-3U Card Family Release Notes OEM Developer Kit and Drivers Document

More information

The Performance Analysis of Linux Networking Packet Receiving

The Performance Analysis of Linux Networking Packet Receiving The Performance Analysis of Linux Networking Packet Receiving Wenji Wu, Matt Crawford Fermilab CHEP 2006 wenji@fnal.gov, crawdad@fnal.gov Topics Background Problems Linux Packet Receiving Process NIC &

More information

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking Roberto Bonafiglia, Ivano Cerrato, Francesco Ciaccia, Mario Nemirovsky, Fulvio Risso Politecnico di Torino,

More information

DROP: An Open-Source Project towards Distributed SW Router Architectures

DROP: An Open-Source Project towards Distributed SW Router Architectures 1 DROP: An Open-Source Project towards Distributed SW Router Architectures Raffaele Bolla, Member, IEEE, Roberto Bruschi, Guerino Lamanna and Andrea Ranieri Department of Communications, Computer and Systems

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

Boosting Data Transfer with TCP Offload Engine Technology

Boosting Data Transfer with TCP Offload Engine Technology Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,

More information

Virtualised MikroTik

Virtualised MikroTik Virtualised MikroTik MikroTik in a Virtualised Hardware Environment Speaker: Tom Smyth CTO Wireless Connect Ltd. Event: MUM Krackow Feb 2008 http://wirelessconnect.eu/ Copyright 2008 1 Objectives Understand

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

A NOVEL RESOURCE EFFICIENT DMMS APPROACH A NOVEL RESOURCE EFFICIENT DMMS APPROACH FOR NETWORK MONITORING AND CONTROLLING FUNCTIONS Golam R. Khan 1, Sharmistha Khan 2, Dhadesugoor R. Vaman 3, and Suxia Cui 4 Department of Electrical and Computer

More information

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand

More information

Gigabit Ethernet. Abstract. 1. Introduction. 2. Benefits of Gigabit Ethernet

Gigabit Ethernet. Abstract. 1. Introduction. 2. Benefits of Gigabit Ethernet Table of Contents Abstract... 2 1. Introduction... 2 2. Benefits of Gigabit Ethernet... 2 2.1 Easy Migration to Higher Performance Levels... 3 2.2 Decreased Overall Costs Over Time... 3 2.3 Supports for

More information

Accelerate In-Line Packet Processing Using Fast Queue

Accelerate In-Line Packet Processing Using Fast Queue Accelerate In-Line Packet Processing Using Fast Queue Chun-Ying Huang 1, Chi-Ming Chen 1, Shu-Ping Yu 1, Sheng-Yao Hsu 1, and Chih-Hung Lin 1 Department of Computer Science and Engineering, National Taiwan

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

Distributed applications monitoring at system and network level

Distributed applications monitoring at system and network level Distributed applications monitoring at system and network level Monarc Collaboration 1 Abstract Most of the distributed applications are presently based on architectural models that don t involve real-time

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

4 Internet QoS Management

4 Internet QoS Management 4 Internet QoS Management Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology stadler@ee.kth.se September 2008 Overview Network Management Performance Mgt QoS Mgt Resource Control

More information

Chapter 5 Cubix XP4 Blade Server

Chapter 5 Cubix XP4 Blade Server Chapter 5 Cubix XP4 Blade Server Introduction Cubix designed the XP4 Blade Server to fit inside a BladeStation enclosure. The Blade Server features one or two Intel Pentium 4 Xeon processors, the Intel

More information

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC) This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE CCNC 2006 proceedings. Putting it on the NIC: A Case Study on application

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project Effects of Filler Traffic In IP Networks Adam Feldman April 5, 2001 Master s Project Abstract On the Internet, there is a well-documented requirement that much more bandwidth be available than is used

More information

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow International Journal of Soft Computing and Engineering (IJSCE) Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow Abdullah Al Masud, Hossain Md. Shamim, Amina Akhter

More information

Architecture of distributed network processors: specifics of application in information security systems

Architecture of distributed network processors: specifics of application in information security systems Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia vlad@neva.ru 1. Introduction Modern

More information

Challenges in high speed packet processing

Challenges in high speed packet processing Challenges in high speed packet processing Denis Salopek University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia denis.salopek@fer.hr Abstract With billions of packets traveling

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Journal of Information & Computational Science 9: 5 (2012) 1273 1280 Available at http://www.joics.com VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Yuan

More information

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser Network Performance Optimisation and Load Balancing Wulf Thannhaeuser 1 Network Performance Optimisation 2 Network Optimisation: Where? Fixed latency 4.0 µs Variable latency

More information

Monitoring high-speed networks using ntop. Luca Deri <deri@ntop.org>

Monitoring high-speed networks using ntop. Luca Deri <deri@ntop.org> Monitoring high-speed networks using ntop Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:

More information

Performance Analysis of Large Receive Offload in a Xen Virtualized System

Performance Analysis of Large Receive Offload in a Xen Virtualized System Performance Analysis of Large Receive Offload in a Virtualized System Hitoshi Oi and Fumio Nakajima The University of Aizu, Aizu Wakamatsu, JAPAN {oi,f.nkjm}@oslab.biz Abstract System-level virtualization

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based

PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based Description Silicom s quad port Copper 10 Gigabit Ethernet Bypass server adapter is a PCI-Express X8 network interface

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Internet SCSI (iscsi)

More information

Tyche: An efficient Ethernet-based protocol for converged networked storage

Tyche: An efficient Ethernet-based protocol for converged networked storage Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June

More information

Comparison of Web Server Architectures: a Measurement Study

Comparison of Web Server Architectures: a Measurement Study Comparison of Web Server Architectures: a Measurement Study Enrico Gregori, IIT-CNR, enrico.gregori@iit.cnr.it Joint work with Marina Buzzi, Marco Conti and Davide Pagnin Workshop Qualità del Servizio

More information

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant Laurak@aesclever.com Applied Expert Systems, Inc. 2011 1 Background

More information

Cisco Integrated Services Routers Performance Overview

Cisco Integrated Services Routers Performance Overview Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,

More information

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器 基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器 楊 竹 星 教 授 國 立 成 功 大 學 電 機 工 程 學 系 Outline Introduction OpenFlow NetFPGA OpenFlow Switch on NetFPGA Development Cases Conclusion 2 Introduction With the proposal

More information

EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES. Ingo Kofler, Robert Kuschnig, Hermann Hellwagner

EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES. Ingo Kofler, Robert Kuschnig, Hermann Hellwagner EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES Ingo Kofler, Robert Kuschnig, Hermann Hellwagner Institute of Information Technology (ITEC) Alpen-Adria-Universität

More information

FlexPath Network Processor

FlexPath Network Processor FlexPath Network Processor Rainer Ohlendorf Thomas Wild Andreas Herkersdorf Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80290 München http://www.lis.ei.tum.de Agenda FlexPath Introduction Work Packages

More information

Presentation of Diagnosing performance overheads in the Xen virtual machine environment

Presentation of Diagnosing performance overheads in the Xen virtual machine environment Presentation of Diagnosing performance overheads in the Xen virtual machine environment September 26, 2005 Framework Using to fix the Network Anomaly Xen Network Performance Test Using Outline 1 Introduction

More information

Software Datapath Acceleration for Stateless Packet Processing

Software Datapath Acceleration for Stateless Packet Processing June 22, 2010 Software Datapath Acceleration for Stateless Packet Processing FTF-NET-F0817 Ravi Malhotra Software Architect Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions

More information

Leased Line + Remote Dial-in connectivity

Leased Line + Remote Dial-in connectivity Leased Line + Remote Dial-in connectivity Client: One of the TELCO offices in a Southern state. The customer wanted to establish WAN Connectivity between central location and 10 remote locations. The customer

More information

pco.interface GigE & USB Installation Guide

pco.interface GigE & USB Installation Guide pco.interface GigE & USB Installation Guide In this manual you find installation instructions for the GigE Vision and USB2.0 interface on Microsoft Windows platforms. Target Audience: This camera is designed

More information

Datacenter Operating Systems

Datacenter Operating Systems Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major

More information

Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards

Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards Andrea Bianco, Robert Birke, Gianluca Botto, Marcello Chiaberge, Jorge M. Finochietto, Giulio Galante, Marco

More information

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of

More information

Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University

Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Napatech - Sharkfest 2009 1 Presentation Overview About Napatech

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Configuring IPS High Bandwidth Using EtherChannel Load Balancing

Configuring IPS High Bandwidth Using EtherChannel Load Balancing Configuring IPS High Bandwidth Using EtherChannel Load Balancing This guide helps you to understand and deploy the high bandwidth features available with IPS v5.1 when used in conjunction with the EtherChannel

More information

The proliferation of the raw processing

The proliferation of the raw processing TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer

More information