High Speed Network Traffic Analysis with Commodity Multi-core Systems
|
|
|
- John Nichols
- 10 years ago
- Views:
Transcription
1 High Speed Network Traffic Analysis with Commodity Multi-core Systems Francesco Fusco IBM Research - Zurich ETH Zurich [email protected] Luca Deri ntop [email protected] ABSTRACT Multi-core systems are the current dominant trend in computer processors. However, kernel network layers often do not fully exploit multi-core architectures. This is due to issues such as legacy code, resource competition of the RXqueues in network interfaces, as well as unnecessary memory copies between the OS layers. The result is that packet capture, the core operation in every network monitoring application, may even experience performance penalties when adapted to multi-core architectures. This work presents common pitfalls of network monitoring applications when used with multi-core systems, and presents solutions to these issues. We describe the design and implementation of a novel multi-core aware packet capture kernel module that enables monitoring applications to scale with the number of cores. We showcase that we can achieve high packet capture performance on modern commodity hardware. Categories and Subject Descriptors D.4.4 [Operating Systems]: Communications Management; C.2.3 [Network Operations]: Network monitoring General Terms Measurement, Performance Keywords Linux kernel, network packet capture, multi-core systems 1. INTRODUCTION The heterogeneity of Internet-based services and advances in interconnection technologies raised the demand for advanced passive monitoring applications. In particular analyzing high-speed networks by means of software applications running on commodity off-the-shelf hardware presents major performance challenges. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX...$ Researchers have demonstrated that packet capture, the cornerstone of the majority of passive monitoring applications, can be substantially improved by enhancing general purpose operating systems for traffic analysis [11, 12, 26]. These results are encouraging because today s commodity hardware offers features and performance that just a few years ago were only provided by costly custom hardware design. Modern network interface cards offer multiple TX/RX queues and advanced hardware mechanisms able to balance traffic across queues. Desktop-class machines are becoming advanced multi-core and even multi-processor parallel architectures capable of executing multiple threads at the same time. Unfortunately, packet capture technologies do not properly exploit this increased parallelism and, as we show in our experiments, packet capture performance may be reduced when monitoring applications instantiate several packet capture threads or multi-queues adapters are used. This is due to three major reasons: a) resource competition of threads on the network interfaces RX queues, b) unnecessary packet copies, and c) improper scheduling and interrupt balancing. In this work, we mitigate the above issues by introducing a novel packet capture technology designed for exploiting the parallelism offered by modern architectures and network interface cards, and we evaluate its performance using hardware traffic generators. The evaluation shows that thanks to our technology a commodity server can process more than 4 Gbps per physical processor, which is more than four times higher than what we can achieve on the same hardware with previous generation packet capture technologies. Our work makes several important contributions: We successfully exploit traffic balancing features offered by modern network adapters and make each RX queue visible to the monitoring applications by means of virtual capture devices. To the best of our knowledge, this work describes the first packet capture technology specifically tailored for modern multi-queue adapters. We propose a solution that substantially simplifies the development of highly scalable multi-threaded traffic analysis applications and we released it under an opensource license. Since compatibility with the popular libpcap [5] library is preserved, we believe that it can smooth the transition toward efficient parallel packet processing. We minimize the memory bandwidth footprint by reducing the per-packet cost to a single packet copy, and
2 optimize the cache hierarchy utilization by combining lock-less buffers together with optimal scheduling settings. 2. MOTIVATION AND SCOPE OF WORK Modern multi-core-aware network adapters are logically partitioned into several RX/TX queues where packets are flow-balanced across queues using hardware-based facilities such as RSS (Receive-side Scaling) part of Intel TM I/O Acceleration Technology (I/O AT) [17, 18]. By splitting a single RX queue into several smaller queues, the load, both in terms of packets and interrupts, can be balanced across cores to improve the overall performance. Modern interface cards (NICs) support static or even dynamically configurable [13] balancing policies. The number of available queues depends on the NIC chipset, and it is limited by the number of available system cores. 1 However, in most operating systems, packets are fetched using packet polling [23, 25] techniques that have been designed in the pre-multi-core age, when network adapters were equipped with a single RX queue. From the operating system point of view, there is no difference between a legacy 100 Mbit card and a modern 10 Gbit card as the driver hides all card, media and network speed details. As shown in Figure 1, device drivers must merge all queues into one as it used to happen with legacy adapters featuring a single queue. This design limitation is the cause of a major performance bottleneck, because even if a user space application spawns several threads to consume packets, they all have to compete for receiving packets from the same socket. Competition is costly as semaphores or similar techniques have to be used in order to serialize this work instead of carrying it out in parallel. Even if multi-core architectures, such as the one depicted in Figure 2, are equipped with cache levels dynamically shared among different cores within a CPU, integrated memory controllers and multi-channel memories, memory bandwidth has been identified as a limiting factor for the scalability of current and future multi-core processors [7, 24]. In fact, technology projections suggest that off-chip memory bandwidth is going to increase slowly compared to the desired growth in the number of cores. The memory wall problem represents a serious issue for memory intensive applications such as traffic analysis software tailored for highspeed networks. Reducing the memory bandwidth by minimizing the number of packet copies is a key requirement to exploit parallel architectures. To reduce the number of packet copies, most capture packet technologies [11] use memory mapping based zerocopy techniques (instead of standard system calls) to carry packets from the kernel level to the user space. The packet journey inside the kernel starts at the NIC driver layer, where incoming packets are copied into a temporary memory area, the socket buffer [8, 21], that holds the packet until it gets processed by the networking stack. In network monitoring, since packets are often received on dedicated adapters not used for routing or management, socket buffers allocations and deallocations are unnecessary and zero-copy could start directly at the driver layer and not just at the networking layer. 1 For example on a quad-core machine we can have up to four queues per port. Figure 1: Design limitation in Network Monitoring Architectures. Figure 2: Commodity parallel architecture. Memory bandwidth can be wasted when cache hierarchies are poorly exploited. Improperly balancing interrupt requests (IRQs) may lead to the excessive cache misses phenomena usually referred to as cache-trashing. In order to avoid this problem, the interrupt request handler and the capture thread that consumes such a packet must be executed on the same processor (to share the L3 level cache) or on the same core with Hyper-Threaded processors. Unfortunately, most operating systems uniformly balance interrupt requests across cores and schedule threads without considering architectural differences between cores. This is, in practice, a common case of packet losses. Modern operating systems allow users to tune IRQ balancing strategy and override the scheduling policy by means of CPU affinity manipulation [20]. Unfortunately, since current operating systems do not deliver queue identifiers up to the user space, applications do not have enough information to properly set the CPU affinity. In summary, we identified two main issues that prevent parallelism from being exploited: There is a single resource competition by multithreaded applications willing to concurrently consume packets coming from the same socket. This prevents multi-queue adapters being fully exploited. Unnecessary packet copies, improper scheduling and interrupt balancing cause a sub-optimal memory bandwidth utilization.
3 The following section describes a packet capture architecture that addresses the identified limitations. 3. TOWARDS MULTI-CORE MONITOR- ING ARCHITECTURES We designed a high performance packet capture technology able to exploit multi-queue adapters and modern multicore processors. We achieve our high performance by introducing virtual capture devices, with multi-threaded polling and zero-copy mechanisms. Linux is used as the target operating system, as it represents the de-facto reference platform for the evaluation of novel packet capture technologies. However, the exploited concepts are general and can also be adapted to other operating systems. Our technology natively supports multi-queues and exposes them to the users as virtual capture devices (see Figure 3). Virtual packet capture devices allow applications to be easily split into several independent threads of execution, each receiving and analyzing a portion of the traffic. In fact monitoring applications can either bind to a physical device (e.g., eth1) for receiving packets from all RX queues, or to a virtual device (e.g., eth1@2) for consuming packets from a specific queue only. The RSS hardware facility is responsible for balancing the traffic across RX queues, with no CPU intervention. The concept of virtual capture device has been implemented in PF RING [11], a kernel level network layer designed for improving Linux packet capture performance. It also provides an extensible mechanism for analyzing packets at the kernel-level. PF RING provides a zero-copy mechanism based on memory mapping to transfer packets from the kernel space to the user space without using expensive system calls (such as read()). However, since it sits on top of the standard network interface card drivers, it is affected by the same problems identified in the previous section. In particular, for each incoming packet a temporary memory area, called socket buffer [8, 21], is allocated by the network driver, and then copied to the PF RING ring buffer which is memory mapped to user space. TNAPI drivers: For avoiding the aforementioned issue, PF RING features a zero-copy ring buffer for each RX queue and it supports a new NIC driver model optimized for packet capture applications called TNAPI (Threaded NAPI 2 ). TNAPI drivers, when used with PF RING completely avoid socket buffers allocations. In particular, packets are copied directly from the RX queue to the associated PF RING ring buffer for user space delivery. This process does not require any memory allocation because both the RX queue and the corresponding PF RING ring are allocated statically. Moreover, since PF RING ring buffers are memory-mapped to the user-space, moving packets from the RX queue ring to the user space requires a single packet copy. In this way, the driver does not deliver packets to the legacy networking stack so that the kernel overhead is completely avoided. If desired, users can configure the driver to push packets into the standard networking stack as well, but this configuration is not recommended as it is the cause of a substantial performance drop as packets have to cross legacy networking stack layers. 2 NAPI is the driver model that introduced polling in the Linux kernel Figure 3: Multi-queue aware packet capture design. Each capture thread fetches packets from a single Virtual Capture Device. Instead of relying on the standard kernel polling mechanisms to fetch packets from each queue, TNAPI features indriver multi-threaded packet polling. TNAPI drivers spawn one polling thread for each RX queue (see Figure 3). Each polling thread fetches incoming packet(s) from the corresponding RX queue, and, passes the packet to PF RING. Inside PF RING, packet processing involves packet parsing and, depending on the configuration, may include packet filtering using the popular BPF filters [5] or even more complex application level filtering mechanisms [16]. Kernel level packet processing is performed by polling threads in parallel. TNAPI drivers spawn polling threads and bind them to a specific core by means of CPU affinity manipulation. In this way the entire traffic coming from a single RX queue is always handled by the same core at the kernel level. The obvious advantage is the increased cache locality for poller threads. However, there is another big gain that depends on interrupt mitigation. Modern network cards and their respective drivers do not raise an interrupt for every packet under high-rate traffic conditions. Instead, drivers disable interrupts and switch to polling mode in such situations. If the traffic is not properly balanced across multi-queues, or if simply the traffic is bursty, we can expect to have busy queues working in polling mode and queues generating interrupts. By binding the polling threads to the same core where interrupts for this queue are received we prevent threads polling busy queues being interrupted by other queues processing low-rate incoming traffic. The architecture depicted in Figure 3 and implemented in TNAPI, solves the single resource competition problem identified in the previous section. In fact, users can instantiate one packet consumer thread at the user space level for each virtual packet capture device (RX queue). Having a single packet consumer per virtual packet capture device does not require any locking primitive such as semaphores that, as a side effect, invalidate processor caches. In fact, for each RX queue the polling thread at the kernel level and the packet consumer thread at the user space level exchange packets through a lock-less Single Reader Single Writer (SRSW) buffer. In order to avoid cache invalidation due to improper
4 scheduling, users can manipulate the CPU affinity to make sure that both threads are executed on cores or Hyper- Threads sharing levels of caches. In this way, multi-core architectures can be fully exploited by leveraging high bandwidth and low-latency inter-core communications. We decided not to impose specific affinity settings for the consumer threads, meaning that the user level packet capture library does not set affinity. Users are responsible for performing fine grained tuning of the CPU affinity depending on how CPU intensive the traffic analysis task is. This is straightforward and under Linux requires a single function call. 3 It is worth noting, that fine-grained tuning of the system is simply not feasible if queue information is not exported up to the user space. Compatibility and Development Issues: Our packet capture tecnology comes with a set of kernel modules and a user-space library called libpring. A detailed description of the API can be found in the user guide [1]. For compatibility reasons we also ported the popular libpcap [5] library on top of our packet capture technology. In this way, already existing monitoring applications can be easily ported onto it. As of today, we have implemented packet capture optimized drivers for popular multi-queue Intel TM 1 and 10 Gbit adapters (82575/6 and 82598/9 chips). 4. EVALUATION We evaluated the work using two different parallel architectures belonging to different market segments (low-end and high-end) equipped with the same Intel multi-queue network card. Details of the platforms are listed in Table 1. An IXIA 400 [4] traffic generator was used to inject the network traffic for experiments. For 10 Gbit traffic generation, several IXIA-generated 1 Gbit streams were merged into a 10 Gbit link using a HP ProCurve switch. In order to exploit balancing across RX queues, the IXIA was configured to generate 64 byte TCP packets (minimum packet size) originated from a single IP address towards a rotating set of 4096 IP destination addresses. With 64 bytes packets, a full Gbit link can carry up to 1.48 Million packets per second (Mpps). Table 1: Evaluation platforms low-end high-end motherboard Supermicro PSDBE Supermicro X8DTL-iF CPU Core2Duo 1.86 Ghz 2x Xeon Ghz 2 cores 8 cores 0 HyperThreads 8 HyperThreads Ram 4 GB 4 GB NIC Intel ET (1 Gbps) Intel ET (1 Gbps) In order to perform performance measurements we used pfcount, a simple traffic monitoring application that counts the number of captured packets. Depending on the configuration, pfcount spawns multiple packet capture threads per network interface and even concurrently captures from multiple network devices, including virtual capture devices. In all tests we enabled multi-queues in drivers, and modified the driver s code so that queue information is propagated up to PF RING; this driver does not spawn any poller thread at the kernel level, and does not avoid socket buffer allocation. We call this driver MQ (multi-queue) and TNAPI the one described in Section 3. 3 See pthread setaffinity np(). Table 2: Packet capture performance (kpps) at 1 Gbps with different two-thread configurations. Setup A Setup B Setup C Platform SQ SQ MQ TNAPI Threads Userspace/Kernel space 1/0 2/0 2/0 11 low-end 721 Kpps 640 Kpps 610 Kpps 1264 Kpps high-end 1326 Kpps 1100 Kpps 708 Kpps 1488 Kpps Comparing Different Approaches: As a first test, we evaluated the packet capture performance when using multi-threaded packet capture applications with and without multi-queue enabled. To do so, we measured the maximum loss free rate when pfcount uses three different twothreaded setups: Setup A: multiple queues are disabled and therefore capture threads read packets from the same interface (single queue, SQ). Threads are synchronized using a r/w semaphore. This setup corresponds to the default Linux configuration shown in Figure 1. Setup B: two queues are enabled (MQ) and there are two capture threads consuming packets from them. No synchronization is needed. Setup C: there is one capture thread at the user level and a polling thread at the kernel level (TNAPI). Table 2 shows the performance results on the multithreaded setups, and also shows as a reference point the single-threaded application. The test confirmed the issues we described in Section 2. When pfcount spawns two threads at the user level, the packet capture performance is actually worse than the single-threaded one. This is expected in both cases (setup A and B). In the case of setup A, the cause of the drop compared to the single-threaded setup is cache invalidations due to locking (semaphore), whereas for B the cause is the round robin IRQ balancing. On the other hand, our approach consisting of using a kernel thread and a thread at the user level (setup C) is indeed effective and allows the low-end platform to almost double its single-thread performance. Moreover, the high-end machine can capture 1 Gbps (1488 kpps) with no loss. CPU Affinity and Scalability: We now turn our attention to evaluating our solution at higher packet rates with the high-end platform. We are interested in understanding if by properly setting the CPU affinity it is possible to effectively partition the computing resources and therefore increase the maximum loss-free rate. 2 NICs: To test the packet capture technology with more traffic, we plug another Intel ET NIC into the high-end systems and we inject with the IXIA traffic generator Mpps for each interface (wire-rate at 1 Gbit with 64 bytes packets). We want to see if it is possible to handle 2 full Gbit links with two cores and two queues per NIC only. To do so, we set the CPU affinity to make sure that for every NIC the two polling threads at the kernel level are executed on different Hyper-Threads belonging to the same core (e.g. 0 and 8 from Figure 4 belong to Core 0 of the first physical processor 4 ). We use the pfcount application to spawn 4 Under Linux, /proc/cpuinfo lists the available processing
5 Figure 4: Core Mapping on Linux with the Dual Xeon. Hyper-Threads on the same core (e.g. 0 and 8) share the L2 cache. Table 3: Packet capture performance (kpps) when capturing concurrently from two 1 Gbit links. Test Capture Polling NIC1 NIC2 threads threads Kpps Kpps affinity affinity 1 not set not set NIC1@0 on 0 NIC1@0 on NIC1@1 on 8 NIC1@1 on 8 NIC2@0 on 2 NIC2@0 on 2 NIC2@1 on 10 NIC2@1 on 10 3 NIC1 on 0,8 NIC1@0 on NIC2 on 2,10 NIC1@1 on 8 NIC2@0 on 2 NIC2@1 on 10 capture threads and we perform measurements with three configurations. First of all, we measure the packet capture rate when one capture thread and one polling thread per queue are spawn (8 threads in total) without setting the CPU affinity (Test 1). Then (Test 2), we repeat the test by binding each capture thread to the same Hyper-Thread where the polling thread for that queue is executed (e.g. for the queue NIC1@0 both polling and capture thread run on Hyper-Thread 0). Finally, in Test 3, we reduce the number of capture threads to one for each interface. For each NIC, the capture thread and the polling threads associated to that interface run on the same core. Table 3 reports the maximum loss-free rate when capturing from two NIC simultaneously using the configurations previously described. As shown in Test 1, without properly tuning the system by means of CPU affinity, our test platform is not capable of capturing, at wire-rate, from two adapters simultaneously. Test 2 and Test 3 show that the performance can be substantially improved by setting the affinity and wire-rate is achieved. In fact, by using a single capture thread for each interface (Test 3) all incoming packets are captured with no loss (1488 kpps per NIC). In principle, we would expect to achieve the wire-rate with the configuration in Test 2 rather than the one used in Test 3. However, splitting the load on two RX queues means that capture threads are idle most of the time, at least on highend processors such as the Xeons we used and a dummy application that only counts packets. As a consequence, capture threads must call poll() very often as they have no packet to process and therefore go to sleep until a new packet arrives; this may lead to packet losses. As system calls are slow, it is better to keep capture threads busy so that poll() units and reports for each of them the core identifier and the physical CPU. Processing units sharing the same physical CPU and core identifier are Hyper-Threads. calls are reduced. The best way of doing so is to capture from two RX queues, in order to increase the number of incoming packets. It is worth noting that, since monitoring applications are more complex than pfcount, the configuration used for Test 2 may provide better performance in practice. 4 NICs: We decided to plug two extra NICs to the system to check if it was possible to reach the wire-rate with 4 NICs at the same time (4 Gbps of aggregated bandwidth with minimum sized packets). The third and fourth NIC were configured using the same tuning parameters as in Test 3 and the measurements repeated. The system can capture 4 Gbps of traffic per physical processor without losing any packet. Due to lack of NICs at the traffic generator we could not evaluate the performance at more than 4 Gbps with synthetic streams of minimum size packets representing the worst-case scenario for a packet capture technology. However, preliminary tests conducted on a 10 Gbit production network (where the average packet size was close to 300 bytes and the used bandwidth around 6 Gbps) confirmed that this setup is effective in practice. The conclusion of the validation is that when CPU affinity is properly tuned, our packet technology allows: Packet capture rate to scale linearly with the number of NICs. Multi-core computers to be partitioned processor-byprocessor. This means that load on each processor does not affect the load on other processors. 5. RELATED WORK The industry followed three paths for accelerating network monitoring applications by means of specialized hardware while keeping the software flexibility. Smart traffic balancers, such as cpacket[2], are special purpose devices used to filter and balance the traffic according to rules, so that multiple monitoring stations receive and analyze a portion of the traffic. Programmable network cards [6] are massively parallel architectures on a network card. They are suitable for accelerating both packet capture and traffic analysis, since monitoring software written in C can be compiled for that special purpose architecture and run on the card and not on the main host. Unfortunately, porting applications on those expensive devices is not trivial. Capture accelerators [3] completely offload monitoring workstations from the packet capturing task leaving more CPU cycles to perform analysis. The card is responsible for coping the traffic directly to the address space of the monitoring application and thus the operating system is completely bypassed. Degiovanni and others [10] show that first generation packet capture accelerators are not able to exploit the parallelism of multi-processor architectures and propose the adoption of a software scheduler to increase the scalability. The scalability issue has been solved by modern capture accelerators that provide facilities to balance the traffic among several threads of execution. The balancing policy is implemented by their firmware and it is not meant to be changed at run-time as it takes seconds if not minutes to reconfigure. The work described in [19] highlights the effects of cache coherence protocols in multi-processor architectures in the context of traffic monitoring. Papadogiannakis and others
6 [22] show how to preserve cache locality for improving traffic analysis performance by means of traffic reordering. Multi-core architectures and multi-queue adapters have been exploited to increase the forwarding performance of software routers [14, 15]. Dashtbozorgi and others [9] propose a traffic analysis architecture for exploiting multi-core processors. Their work is orthogonal to ours, as they do not tackle the problem of enhancing packet capture through parallelism exploitation. Several research efforts show that packet capture can be substantially improved by customizing general purpose operating systems. ncap [12] is a driver that maps the card memory in user-space, so that packets can be fetched from user-space without any kernel intervention. The work described in [26] proposes the adoption of large buffers containing a long queue of packets to amortize the cost of system calls under Windows. PF RING [11] reduces the number of packet copies, and thus, increases the packet capture performance, by introducing a memory-mapped channel to carry packets from the kernel to the user space. 6. OPEN ISSUES AND FUTURE WORK This work represents a first step toward our goal of exploiting the parallelism of modern multi-core architectures for packet analysis. There are several important steps we intend to address in future work. The first step is to introduce a software layer capable of automatically tuning the CPU affinity settings, which is crucial for achieving high performance. Currently, choosing the correct CPU affinity settings is not a straightforward process for non-expert users. In addition, one of the basic assumption of our technology is that the hardware-based balancing mechanism (RSS in our case) is capable of evenly distributing the incoming traffic among cores. This is often, but not always, true in practice. In the future, we plan to exploit mainstream network adapters supporting hardware-based and dynamically configurable balancing policies [13] to implement an adaptive hardware-assisted software packet scheduler that is able to dynamically distribute the workload among cores. 7. CONCLUSIONS This paper highlighted several challenges when using multi-core systems for network monitoring applications: resource competition of threads on network buffer queues, unnecessary packet copies, interrupt and scheduling imbalances. We proposed a novel approach to overcome the existing limitations and showed solutions for exploiting multicores and multi-queue adapters for network monitoring. The validation process has demonstrated that by using TNAPI it is possible to capture packets very efficiently both at 1 and 10 Gbit. Therefore, our results present the first softwareonly solution to show promise towards offering scalability with respect to the number of processors for packet capturing applications. 8. ACKNOWLEDGEMENT The authors would like to thank J. Gasparakis and P. Waskiewicz Jr from Intel TM for the insightful discussions about 10 Gbit on multi-core systems, as well M. Vlachos, and X. Dimitropoulos for their suggestions while writing this paper. 9. CODE AVAILABILITY This work is distributed under the GNU GPL license and is available at no cost from the ntop home page REFERENCES [1] PF RING User Guide. userguide.pdf. [2] cpacket networks - complete packet inspection on a chip. [3] Endace ltd. [4] Ixia leader in converged ip testing. Homepage [5] Libpcap. Homepage [6] A. Agarwal. The tile processor: A 64-core multicore for embedded processing. Proc. of HPEC Workshop, [7] K. Asanovic et al. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS , EECS Department, University of California, Berkeley, Dec [8] A. Cox. Network buffers and memory management. The Linux Journal, Issue 30,(1996). [9] M. Dashtbozorgi and M. Abdollahi Azgomi. A scalable multi-core aware software architecture for high-performance network monitoring. In SIN 09: Proc. of the 2nd Int. conference on Security of information and networks, pages , New York, NY, USA, ACM. [10] L. Degioanni and G. Varenni. Introducing scalability in network measurement: toward 10 gbps with commodity hardware. In IMC 04: Proc. of the 4th ACM SIGCOMM conference on Internet measurement, pages , New York, NY, USA, ACM. [11] L. Deri. Improving passive packet capture: beyond device polling. Proc. of SANE, [12] L. Deri. ncap: Wire-speed packet capture and transmission. In E2EMON 05: Proc. of the End-to-End Monitoring Techniques and Services, pages 47 55, Washington, DC, USA, IEEE Computer Society. [13] L. Deri, J. Gasparakis, P. Waskiewicz Jr, and F. Fusco. Wire-Speed Hardware-Assisted Traffic Filtering with Mainstream Network Adapters. In NEMA 10: Proc. of the First Int. Workshop on Network Embedded Management and Applications, page to appear, [14] N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, L. Mathy, and P. Papadimitriou. A platform for high performance and flexible virtual routers on commodity hardware. SIGCOMM Comput. Commun. Rev., 40(1): , [15] N. Egi, A. Greenhalgh, M. Handley, G. Iannaccone, M. Manesh, L. Mathy, and S. Ratnasamy. Improved forwarding architecture and resource management for multi-core software routers. In NPC 09: Proc. of the 2009 Sixth IFIP Int. Conference on Network and Parallel Computing, pages , Washington, DC, USA, IEEE Computer Society. [16] F. Fusco, F. Huici, L. Deri, S. Niccolini, and T. Ewald.
7 Enabling high-speed and extensible real-time communications monitoring. In IM 09: Proc. of the 11th IFIP/IEEE Int. Symposium on Integrated Network Management, pages , Piscataway, NJ, USA, IEEE Press. [17] Intel. Accelerating high-speed networking with intel i/o acceleration technology. White Paper, [18] Intel. Intelligent queuing technologies for virtualization. White Paper, [19] A. Kumar and R. Huggahalli. Impact of cache coherence protocols on the processing of network traffic. In MICRO 07: Proc. of the 40th Annual IEEE/ACM Int. Symposium on Microarchitecture, pages , Washington, DC, USA, IEEE Computer Society. [20] R. Love. Cpu affinity. Linux Journal, Issue 111,(July 2003). [21] B. Milekic. Network buffer allocation in the freebsd operating system. Proc. of BSDCan,(2004). [22] A. Papadogiannakis, D. Antoniades, M. Polychronakis, and E. P. Markatos. Improving the performance of passive network monitoring applications using locality buffering. In MASCOTS 07: Proc. of the th Int. Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages , Washington, DC, USA, IEEE Computer Society. [23] L. Rizzo. Device polling support for freebsd. BSDConEurope Conference, (2001). [24] B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: challenges in and avenues for cmp scaling. SIGARCH Comput. Archit. News, 37(3): , [25] J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond softnet. In ALS 01: Proc. of the 5th annual Linux Showcase & Conference, pages 18 18, Berkeley, CA, USA, USENIX Association. [26] M. Smith and D. Loguinov. Enabling high-performance internet-wide measurements on windows. In PAM 10: Proc. of Passive and Active Measurement Conference, pages , Zurich, Switzerland, 2010.
Exploiting Commodity Multi-core Systems for Network Traffic Analysis
Exploiting Commodity Multi-core Systems for Network Traffic Analysis Luca Deri ntop.org Pisa, Italy [email protected] Francesco Fusco IBM Zurich Research Laboratory Rüschlikon, Switzerland [email protected]
ncap: Wire-speed Packet Capture and Transmission
ncap: Wire-speed Packet Capture and Transmission L. Deri ntop.org Pisa Italy [email protected] Abstract With the increasing network speed, it is no longer possible to capture and transmit network packets at
10 Gbit Hardware Packet Filtering Using Commodity Network Adapters. Luca Deri <[email protected]> Joseph Gasparakis <joseph.gasparakis@intel.
10 Gbit Hardware Packet Filtering Using Commodity Network Adapters Luca Deri Joseph Gasparakis 10 Gbit Monitoring Challenges [1/2] High number of packets to
Wire-Speed Hardware-Assisted Traffic Filtering with Mainstream Network Adapters
Wire-Speed Hardware-Assisted Traffic Filtering with Mainstream Network Adapters Luca Deri 1, Joseph Gasparakis 2, Peter Waskiewicz Jr 3, Francesco Fusco 4 5 1 ntop, Pisa, Italy 2 Intel Corporation, Embedded
On Multi Gigabit Packet Capturing With Multi Core Commodity Hardware
On Multi Gigabit Packet Capturing With Multi Core Commodity Hardware Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, and Gregorio Procissi CNIT and Università di Pisa, Pisa, Italy Abstract. Nowadays
Comparing and Improving Current Packet Capturing Solutions based on Commodity Hardware
Comparing and Improving Current Packet Capturing Solutions based on Commodity Hardware Lothar Braun, Alexander Didebulidze, Nils Kammenhuber, Georg Carle Technische Universität München Institute for Informatics
Wire-speed Packet Capture and Transmission
Wire-speed Packet Capture and Transmission Luca Deri Packet Capture: Open Issues Monitoring low speed (100 Mbit) networks is already possible using commodity hardware and tools based on libpcap.
OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin
OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Evaluating the Suitability of Server Network Cards for Software Routers
Evaluating the Suitability of Server Network Cards for Software Routers Maziar Manesh Katerina Argyraki Mihai Dobrescu Norbert Egi Kevin Fall Gianluca Iannaccone Eddie Kohler Sylvia Ratnasamy EPFL, UCLA,
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
MIDeA: A Multi-Parallel Intrusion Detection Architecture
MIDeA: A Multi-Parallel Intrusion Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011 Network
vpf_ring: Towards Wire-Speed Network Monitoring Using Virtual Machines
vpf_ring: Towards Wire-Speed Network Monitoring Using Virtual Machines Alfredo Cardigliano 1 Luca Deri 1 2 1 ntop, 2 IIT-CNR Pisa, Italy Joseph Gasparakis Intel Corporation Shannon, Ireland Francesco Fusco
Performance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
Monitoring high-speed networks using ntop. Luca Deri <[email protected]>
Monitoring high-speed networks using ntop Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:
High-Density Network Flow Monitoring
High-Density Network Flow Monitoring Petr Velan CESNET, z.s.p.o. Zikova 4, 160 00 Praha 6, Czech Republic [email protected] Viktor Puš CESNET, z.s.p.o. Zikova 4, 160 00 Praha 6, Czech Republic [email protected]
Towards Monitoring Programmability in Future Internet: challenges and solutions
Towards Monitoring Programmability in Future Internet: challenges and solutions Francesco Fusco, Luca Deri and Joseph Gasparakis Abstract Internet is becoming a global IT infrastructure serving interactive
Collecting Packet Traces at High Speed
Collecting Packet Traces at High Speed Gorka Aguirre Cascallana Universidad Pública de Navarra Depto. de Automatica y Computacion 31006 Pamplona, Spain [email protected] Eduardo Magaña Lizarrondo
Open-source routing at 10Gb/s
Open-source routing at Gb/s Olof Hagsand, Robert Olsson and Bengt Gördén, Royal Institute of Technology (KTH), Sweden Email: {olofh, gorden}@kth.se Uppsala University, Uppsala, Sweden Email: [email protected]
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Open Source in Network Administration: the ntop Project
Open Source in Network Administration: the ntop Project Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:
Challenges in high speed packet processing
Challenges in high speed packet processing Denis Salopek University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia [email protected] Abstract With billions of packets traveling
Leveraging NIC Technology to Improve Network Performance in VMware vsphere
Leveraging NIC Technology to Improve Network Performance in VMware vsphere Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction... 3 Hardware Description... 3 List of Features... 4 NetQueue...
Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware
Packet Capture in 1-Gigabit Ethernet Environments Using Contemporary Commodity Hardware Fabian Schneider Jörg Wallerich Anja Feldmann {fabian,joerg,anja}@net.t-labs.tu-berlin.de Technische Universtität
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
vpf_ring: Towards Wire-Speed Network Monitoring Using Virtual Machines
vpf_ring: Towards Wire-Speed Network Monitoring Using Virtual Machines Alfredo Cardigliano 1 Luca Deri 1,2 ntop 1, IIT-CNR 2 Pisa, Italy {cardigliano, deri}@ntop.org Joseph Gasparakis Intel Corporation
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro
Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro Whitepaper What s wrong with today s clouds? Compute and storage virtualization has enabled
Multi-core and Linux* Kernel
Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores
Design of an Application Programming Interface for IP Network Monitoring
Design of an Application Programming Interface for IP Network Monitoring Evangelos P. Markatos Kostas G. Anagnostakis Arne Øslebø Michalis Polychronakis Institute of Computer Science (ICS), Foundation
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Network Virtualization Technologies and their Effect on Performance
Network Virtualization Technologies and their Effect on Performance Dror Goldenberg VP Software Architecture TCE NFV Winter School 2015 Cloud Computing and NFV Cloud - scalable computing resources (CPU,
Design considerations for efficient network applications with Intel multi-core processor-based systems on Linux*
White Paper Joseph Gasparakis Performance Products Division, Embedded & Communications Group, Intel Corporation Peter P Waskiewicz, Jr. LAN Access Division, Data Center Group, Intel Corporation Design
Demartek June 2012. Broadcom FCoE/iSCSI and IP Networking Adapter Evaluation. Introduction. Evaluation Environment
June 212 FCoE/iSCSI and IP Networking Adapter Evaluation Evaluation report prepared under contract with Corporation Introduction Enterprises are moving towards 1 Gigabit networking infrastructures and
Network Monitoring on Multicores with Algorithmic Skeletons
Network Monitoring on Multicores with Algorithmic Skeletons M. Danelutto, L. Deri, D. De Sensi Computer Science Department University of Pisa, Italy Abstract. Monitoring network traffic on 10 Gbit networks
50. DFN Betriebstagung
50. DFN Betriebstagung IPS Serial Clustering in 10GbE Environment Tuukka Helander, Stonesoft Germany GmbH Frank Brüggemann, RWTH Aachen Slide 1 Agenda Introduction Stonesoft clustering Firewall parallel
Cisco Integrated Services Routers Performance Overview
Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
Performance measurements of syslog-ng Premium Edition 4 F1
Performance measurements of syslog-ng Premium Edition 4 F1 October 13, 2011 Abstract Performance analysis of syslog-ng Premium Edition Copyright 1996-2011 BalaBit IT Security Ltd. Table of Contents 1.
HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring
CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract
Broadcom Ethernet Network Controller Enhanced Virtualization Functionality
White Paper Broadcom Ethernet Network Controller Enhanced Virtualization Functionality Advancements in VMware virtualization technology coupled with the increasing processing capability of hardware platforms
Rackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
Performance Guideline for syslog-ng Premium Edition 5 LTS
Performance Guideline for syslog-ng Premium Edition 5 LTS May 08, 2015 Abstract Performance analysis of syslog-ng Premium Edition Copyright 1996-2015 BalaBit S.a.r.l. Table of Contents 1. Preface... 3
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing
Journal of Information & Computational Science 9: 5 (2012) 1273 1280 Available at http://www.joics.com VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing Yuan
I3: Maximizing Packet Capture Performance. Andrew Brown
I3: Maximizing Packet Capture Performance Andrew Brown Agenda Why do captures drop packets, how can you tell? Software considerations Hardware considerations Potential hardware improvements Test configurations/parameters
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
WireCAP: a Novel Packet Capture Engine for Commodity NICs in High-speed Networks
WireCAP: a Novel Packet Capture Engine for Commodity NICs in High-speed Networks Wenji Wu, Phil DeMar Fermilab Network Research Group [email protected], [email protected] ACM IMC 2014 November 5-7, 2014 Vancouver,
High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.
High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs. 0 Outline Motivation and background Issues on current
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Napatech - Sharkfest 2009 1 Presentation Overview About Napatech
Chronicle: Capture and Analysis of NFS Workloads at Line Rate
Chronicle: Capture and Analysis of NFS Workloads at Line Rate Ardalan Kangarlou, Sandip Shete, and John Strunk Advanced Technology Group 1 Motivation Goal: To gather insights from customer workloads via
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Design and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
High-Performance Many-Core Networking: Design and Implementation
High-Performance Many-Core Networking: Design and Implementation Jordi Ros-Giralt, Alan Commike, Dan Honey, Richard Lethin Reservoir Labs 632 Broadway, Suite 803 New York, NY 10012 Abstract Operating systems
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking Roberto Bonafiglia, Ivano Cerrato, Francesco Ciaccia, Mario Nemirovsky, Fulvio Risso Politecnico di Torino,
Computer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
The Microsoft Windows Hypervisor High Level Architecture
The Microsoft Windows Hypervisor High Level Architecture September 21, 2007 Abstract The Microsoft Windows hypervisor brings new virtualization capabilities to the Windows Server operating system. Its
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service
DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service Achieving Scalability and High Availability Abstract DB2 Connect Enterprise Edition for Windows NT provides fast and robust connectivity
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University
Virtual Machine Monitors Dr. Marc E. Fiuczynski Research Scholar Princeton University Introduction Have been around since 1960 s on mainframes used for multitasking Good example VM/370 Have resurfaced
Security Overview of the Integrity Virtual Machines Architecture
Security Overview of the Integrity Virtual Machines Architecture Introduction... 2 Integrity Virtual Machines Architecture... 2 Virtual Machine Host System... 2 Virtual Machine Control... 2 Scheduling
VMware vsphere 4.1 Networking Performance
VMware vsphere 4.1 Networking Performance April 2011 PERFORMANCE STUDY Table of Contents Introduction... 3 Executive Summary... 3 Performance Enhancements in vsphere 4.1... 3 Asynchronous Transmits...
High-Performance Network Traffic Processing Systems Using Commodity Hardware
High-Performance Network Traffic Processing Systems Using Commodity Hardware José LuisGarcía-Dorado, Felipe Mata, Javier Ramos, Pedro M. Santiago del Río, Victor Moreno, and Javier Aracil High Performance
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Big Data Technologies for Ultra-High-Speed Data Transfer and Processing
White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel
From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller
White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller
High-performance vswitch of the user, by the user, for the user
A bird in cloud High-performance vswitch of the user, by the user, for the user Yoshihiro Nakajima, Wataru Ishida, Tomonori Fujita, Takahashi Hirokazu, Tomoya Hibi, Hitoshi Matsutahi, Katsuhiro Shimano
The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor
The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor Howard Anglin [email protected] IBM Competitive Project Office May 2013 Abstract...3 Virtualization and Why It Is Important...3 Resiliency
A Distributed, Robust Network Architecture Built on an Ensemble of Open-Source Firewall-Routers Dr Simon A. Boggis IT Services, Queen Mary, University of London JANET Networkshop
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant [email protected] Applied Expert Systems, Inc. 2011 1 Background
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
10 Gbit/s Line Rate Packet Processing Using Commodity Hardware: Survey and new Proposals
10 Gbit/s Line Rate Packet Processing Using Commodity Hardware: Survey and new Proposals Luigi Rizzo, Luca Deri, Alfredo Cardigliano ABSTRACT The network stack of operating systems has been designed for
Networking Driver Performance and Measurement - e1000 A Case Study
Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg
Increasing Data Center Network Visibility with Cisco NetFlow-Lite
Increasing Data Center Network Visibility with Cisco NetFlow-Lite Luca Deri ntop, IIT-CNR Pisa, Italy [email protected] Ellie Chou, Zach Cherian, Kedar Karmarkar Cisco Systems San Jose, CA, USA {wjchou, zcherian,
Software-based Packet Filtering. Fulvio Risso Politecnico di Torino
Software-based Packet Filtering Fulvio Risso Politecnico di Torino 1 Part 1: Packet filtering concepts 3 Introduction to packet filters A packet filter is a system that applies a Boolean function to each
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Improving Passive Packet Capture: Beyond Device Polling
Improving Passive Packet Capture: Beyond Device Polling Luca Deri NETikos S.p.A. Via del Brennero Km 4, Loc. La Figuretta 56123 Pisa, Italy Email: [email protected] http://luca.ntop.org/ Abstract Passive
Windows 8 SMB 2.2 File Sharing Performance
Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit
Network Function Virtualization Using Data Plane Developer s Kit
Network Function Virtualization Using Enabling 25GbE to 100GbE Virtual Network Functions with QLogic FastLinQ Intelligent Ethernet Adapters DPDK addresses key scalability issues of NFV workloads QLogic
Infrastructure for active and passive measurements at 10Gbps and beyond
Infrastructure for active and passive measurements at 10Gbps and beyond Best Practice Document Produced by UNINETT led working group on network monitoring (UFS 142) Author: Arne Øslebø August 2014 1 TERENA
OpenFlow Based Load Balancing
OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple
The Elements of GigE Vision
What Is? The standard was defined by a committee of the Automated Imaging Association (AIA). The committee included Basler AG and companies from all major product segments in the vision industry. The goal
