Performance of optimized software implementation of the iscsi protocol

Size: px
Start display at page:

Download "Performance of optimized software implementation of the iscsi protocol"

Transcription

1 Performance of optimized software implementation of the iscsi protocol Fujita Tomonori and Ogawara Masanori NTT Network Innovation Laboratories 1-1 Hikarinooka Yokosuka-Shi Kanagawa, Japan Abstract The advent of IP-based storage networking has brought specialized network adapters that directly support TCP/IP and storage protocols on the market to get comparable performance that specialized high-performance storage networking architectures provide. This paper describes an efficient software implementation of the iscsi protocol with a commodity networking infrastructure. Though several studies have compared the performances of specialized network adapters and commodity network adapters, our iscsi implementation eliminates data copying overhead unlike straightforward iscsi implementations used in previous studies. To achieve it, we modified a general-purpose operating system by using techniques studied for improving TCP performance in the literature and features that commodity Gigabit Ethernet adapters support. We also quantified their effects. Our microbenchmarks show, compared with a straightforward iscsi driver that does not use these techniques, the iscsi driver with these optimizations reduces CPU utilization from 39.4% to 3.8% when writing with an I/O size of 64 KB. However, when reading, any performance gain is negated due to the high cost of operations on the virtual memory system. I. INTRODUCTION To cope with the rapidly growing volume of data, many companies have started to adopt new storage architectures, in which servers and storage devices are interconnected by specialized high-speed links such as Fibre Channel. The new storage architectures, called storage networking architectures, have superseded traditional storage architectures, in which servers and storage devices are directly connected by system buses. Storage networking architectures make it easier to expand, distribute and administer storage than with the traditional storage architectures. With advances in Ethernet technology, more advanced storage networking technology, i.e., the iscsi protocol [1] is on the horizon. It encapsulates a block-level storage protocol, that is, the SCSI protocol into the TCP/IP protocol, and it carries packets over IP networks. The iscsi protocol has some advantages over other storage networking architectures: it is based on two well-known technologies, that is, the SCSI protocol and the TCP/IP protocol, which have been used for many years. Integrating storage networking with mainstream data communications is possible by using Ethernet networks. Furthermore, many engineers who are familiar with these technologies may provide economic and management advantages. The authors current affiliation is NTT Cyber Solutions Laboratories. Though the iscsi protocol is fully compatible with commodity networking infrastructures, specialized network adapters, such as TCP Offload Engine (TOE) and Host Bus Adapter (HBA), exist on the market. Vendors claim that they can obtain comparable performance with existing specialized storage networking architectures by offloading the overheads of IP-based storage networking. A TOE adapter offloads the TCP/IP protocol overhead on the host CPU by directly supporting the TCP/IP protocol functions. Furthermore, an HBA adapter offloads the TCP/IP protocol and data copying overheads by directly supporting the TCP/IP protocol and storage protocol functions. Several studies have compared the performance of specialized network adapters and commodity Gigabit Ethernet adapters. This paper provides an analysis on how high a level of performance the iscsi protocol can provide without having specialized hardware. We made changes to the Linux kernel to get the best performance with commodity networking infrastructures by using techniques studied for improving TCP performance in the literature. We also used three features that commodity Gigabit Ethernet adapters support, i.e., checksum offloading, jumbo frames, and TCP segmentation offloading (TSO). Unlike the straightforward iscsi implementations used in previous studies, ours avoids copying between buffers in the virtual memory (VM) system and those in the network subsystem. This paper focuses on the implementation of an initiator, i.e., a client of a SCSI interface that issues SCSI commands to request services from a target, typically a disk array, that executes SCSI commands. However, most of the discussion is applicable to a target in the case it is implemented by software like some iscsi appliances on the market. The outline of the rest of this paper is as follows. Section II summarizes related work. Section III covers the issues related to optimizations for to achieve the high performance, and discusses some of the detailed implementations. Section IV presents our performance results. And then Section V summarizes the main points to conclude the paper. II. RELATED WORK Some performance studies of the iscsi protocol have been conducted in the past. Sarkar et al. [2] evaluated the performance of a software implementation and those of implementations with TOE and HBA adapters. Stephen Aiken

2 et al. [3] contrasted the performance of a software implementation with that of Fibre Channel. Wee Teck Ng et al. [4] investigated the performance of the iscsi protocol over wide area networks with high latency and congestion. Unlike our iscsi implementation, their software implementations are straightforward, that is, they are based on unmodified TCP/IP stacks, copy data between buffers in a VM system and those in a network subsystem in an operating system, and do not exploit some of the features that the network adapter supports. We cannot directly compare the performance that we obtained with those obtained from their studies on TOE adapters, HBA adapters, and Fibre Channel, due to differences in the experimental infrastructure. However, we believe that the indirect comparison between their studies and ours may provide some useful information. There are a number of papers that point out that TCP performance on high-speed links is limited by the host system and study several optimization approaches. Chase et al. [5] outline a variety of approaches. In this paper, we adopt a technique called page flipping or page remapping to avoid copying, and evaluate how much it affects the performance of optimized software implementation of the iscsi protocol. Unlike many studies [6] [8] on TCP performance, this paper does not address the avoidance of copying data between the kernel and user address spaces. Instead, it focuses on the avoidance of redundant copying between buffers in several subsystems in an operating system. That is, our goal is singlecopy from the application point of view. Even if using techniques studied in the literature to avoid copying data between the kernel and user address spaces may possibly result in better performance, making the techniques work with the iscsi protocol requires extensive modifications to subsystems in an operating system, such as file systems, a memory management architecture and a network subsystem. TCP overhead on the host system is mostly inherited from its design. It leads to many newly low overhead networking protocols, called user level protocol or light weight protocol [9] [11]. These works result in specialized file systems such as DAFS [12]. These protocols that are free from negative inheritances of the TCP protocol can achieve better performance. Furthermore, to take advantage of the merits of both new protocols and TCP protocol, integrating them with TCP protocol has been addressed. Commonly cited disadvantages of IP-based storage networking over existing specialized storage networking architectures are the TCP/IP protocol and data copying overheads. They can decrease the performance of applications by consuming CPU and memory system bandwidth. Though general SCSI host bus adapter drivers directly move data between buffers in a VM system and the target, straightforward iscsi drivers copy data between buffers in a VM system and those in a network subsystem. We were able to find four implementations of an iscsi initiator [13] [16], which are licensed under open source licenses [17], for the Linux operating system. All implementations copy data between the VM cache and the network subsystem. However, the copying can be avoided in some cases. We discuss techniques for copy avoidance and the issues of implementing them in the Linux kernel in the following sections. Operating System Software Components General SCSI driver SCSI Host Bus Adapter SCSI cable Initiator tual Memory system SCSI subsystem System Bus Target Disk drive, Tape, etc Fig. 1. VM Cache iscsi driver Network subsystem Buffer Network Adapter Network cable SCSI driver architecture. Extra Data Copy III. OPTIMIZATION TECHNIQUES Figure 1 shows the difference between the architectures of general SCSI drivers and iscsi drivers, and data flow inside an operating system 1. In general, an iscsi initiator is implemented as a SCSI host bus adapter driver. However, unlike general SCSI drivers interacting with their own host bus adapter hardware, an iscsi driver interacts with a network subsystem. 1 We assume that cached file system data are managed by a virtual memory system, as in the Linux kernel. A. Network adapter support Though commodity Gigabit Ethernet adapters cannot directly process the TCP/IP protocol and the iscsi protocol unlike specialized network adapters, they provide some features that can be useful for offloading the TCP/IP protocol and data copying overheads, which IP-based storage networking has. We use the following three features. Checksum offloading. Calculating checksums to detect the corruption of a packet is an expensive operation. Checksum calculation on a network adapter instead of the host CPU can improve performance, and it is common

3 today. Six out of nine kinds of chipsets for a Gigabit Ethernet adapter that the Linux kernel version 2.6.-test1 supports provide this feature. Checksum offloading works on both the transmitting and receiving sides 2. Jumbo frames. Large packets can improve performance by reducing the number of packets that the host processes. However, standard IEEE 2.3 Ethernet frame sizes allow the MTU size to be 15 bytes at the most. Nowadays most Gigabit Ethernet adapters support jumbo frames, that allow the host to use the larger MTU size (typically up to 9 bytes). Seven out of nine kinds of chipsets for a Gigabit Ethernet adapter that the Linux kernel version 2.6.-test1 supports provide this feature. Jumbo frames are mandatory to achieve copy avoidance when reading data. TCP Segmentation offloading. A network subsystem must break a packet whose size is larger than the TCP segment size into smaller pieces at the TCP layer before it is passed to the IP layer. TCP Segmentation offloading (TSO) can improve performance by executing some parts of this work on a network adaptor, and is relatively new. Two out of nine kinds of chipsets for a Gigabit Ethernet adapter that the Linux kernel version 2.6.-test1 supports provide this feature. Though copy avoidance is possible without TCP Segmentation offloading, it can improve the performance when writing large data. B. Write The avoidance of copying when data travel from an initiator to a target (i.e., write) is easier than when data travel from a target to an initiator (i.e., read). Instead of copying data from the VM cache into buffers in the network subsystem, our iscsi driver hands over references to buffers in the VM system to the network subsystem when writing. In the Linux kernel, an sk buff structure is used for memory management in the network subsystem. It is called skbuff, and is equivalent to an mbuf structure in the BSD lineage. An skbuff holds not only its own buffer space, but also references to a page structure describing a page in order to support avoiding redundant copying between the VM cache and skbuffs. Moreover, the Linux kernel also provides interfaces called sendpage interfaces to use these data structures. These interfaces pass references to page structures describing pages in the VM system to the network subsystem. And, a network adapter driver receives skbuffs holding the references from the network subsystem, and then the contents of the referred pages are transferred to the wire. By using these interfaces, our iscsi driver can avoid copying data between the VM cache and skbuffs without modifications to the Linux kernel. Our iscsi driver performs the following operations when writing. 2 Some network adapters can compute checksums on only transmitting packets. 1) The SCSI subsystem receives references to page structures describing dirty pages 3 from the VM system, and hands over them to the iscsi driver. 2) The iscsi driver associates these references with skbuffs for iscsi packets by using sendpage interfaces. 3) References are passed on to a network adapter driver and the contents of referred pages are transmitted by using direct memory access (DMA). Unlike the avoidance of copying data between the kernel and user address spaces, the iscsi driver does not require pinning and unpinning of transferred pages during DMA transfers, because the VM system increases a usage count of these pages. It guarantees that the VM system does not reclaim these pages until the iscsi command related them finishes. To get the best performance improvement from copy avoidance, a network adapter has to provide to calculate data checksums during DMA transfers. Without this feature, the performance gain from copy avoidance becomes negligible [5], because in the Linux kernel the TCP stack integrates calculating the checksum on data with copying the data. 1) iscsi digest: Without modifications to the Linux kernel, it is possible to avoid copying when writing data in most cases. However, this scheme for copy avoidance does not work in the case the data digest feature that the iscsi protocol supports is enabled. The iscsi protocol defines a 32-bit CRC digest on an iscsi packet to detect the corruption of iscsi PDU (protocol data unit) headers and/or data because a 16 bit checksum used by TCP to detect the corruption of a TCP packet was considered to be too weak for the requirements of storage on long distance data transfer [18]. These digests are called header digest and data digest respectively. When these digests are used, the iscsi driver performs in a slightly different way. 1) The SCSI subsystem receives references to dirty pages from the VM system, and hands over them to the iscsi driver. 2) The iscsi driver associates these references with skbuffs for iscsi packets by using sendpage interfaces. 3) If the header digest feature is enabled, the iscsi driver computes a digest on an iscsi PDU header. 4) If the data digest feature is enabled, the iscsi driver computes a digest on an iscsi PDU data, that is, the contents of dirty pages. 5) References are passed on to a network adapter driver and the contents of referred pages are transmitted by using DMA. Note that the Linux kernel permits modifying the contents of a page that are being written to persistent storage. Therefore, a process or a subsystem can modify the contents of a page at any time during all of the above operations. If the contents of a transferred page are modified between the fourth and fifth operations, a target sees the combination 3 A dirty page means a page whose contents are modified in the VM system that must be written to persistent storage.

4 of the data digest on the old contents and the new contents. Thus, the verifying of the data digest fails on the target. In this case, the iscsi driver must compute again a data digest on the new contents of the page before a network adapter transmits them. However, the iscsi driver has no chance to do it. Therefore, the iscsi driver must keep a transferred page untouched after computing a data digest on the page until the iscsi command related to the page finishes. A synchronous scheme to avoid such a situation is necessary to enable our copy avoidance scheme to work with the data digest feature. The current version of our iscsi driver simply copies data between the VM cache and skbuffs instead of using sendpage interfaces in the case the data digest feature is used. On the other hand, the header digest feature always works with our scheme to avoid copying, because a header of an iscsi PDU can be modified only by the iscsi driver. C. Read With regard to reading data, the avoidance of copying can be more problematical. This is because an iscsi driver has to place incoming data from a network subsystem at arbitrary virtual addresses specified by a VM system without copying the data. An HBA adapter processing iscsi and network protocols can determine the virtual addresses chosen by a VM system corresponding to incoming data, and directly place the data in the appropriate places. Thus the driver does not have to do anything with it. On the other hand, a commodity network adapter cannot process these protocols. Therefore, it cannot directly place incoming data at the virtual addresses specified by a VM system. One technique to avoid copying data with a commodity network adapter is page remapping. It changes address bindings between buffers in a VM system and a network subsystem. This scheme is widely used in research systems; however, it has some restrictions. The Maximum Transfer Unit (MTU) size must be larger than the system virtual memory page size. The network adapter must place incoming data in a way that the data that a VM system requires are aligned on page boundaries after removing the Ethernet, IP, TCP, and iscsi headers. Page remapping requires the high-cost operations of manipulating data structures related to a memory management unit (MMU) and flushing an address-translation cache, called a translation lookaside buffer (TLB). These operations are particularly slow in symmetric multiprocessor (SMP) environments. The first issue means that the technique does not work with standard IEEE 2.3 Ethernet frame sizes, and thus jumbo frames are mandatory. The requirement in the second issue can be achieved with some modifications to a general-purpose operating system. We explain these modifications in subsequent paragraphs. The operations in the last issue can be avoidable with data movement between buffers in a VM system and those in a network subsystem 4, though such operations are mandatory with copy avoidance between the kernel and user address spaces. The reason for this is that any kernel virtual address into which a page in a VM system will do in UNIX operating systems as long as the page is mapped. In general, pages in a VM system are temporarily mapped into the kernel virtual address space. When a process or a subsystem reads or writes the contents of a page, the page is mapped into a kernel virtual address. When it finishes the operation, the page is unmapped. The page can be mapped into another kernel virtual address next time. That is, a process does not directly see the kernel virtual address into which a page is mapped. This scheme does not work if a page in the VM system is mapped into user virtual addresses, e.g, if files or devices are mapped into user virtual addresses by using the mmap system call, high-cost MMU operations are inevitable. In this case, a process directly sees user virtual addresses, and thus a new page must be mapped into the same user virtual addresses. Our iscsi driver requires some modifications to two parts in the Linux kernel: the memory management in the network subsystem and the VM cache management. 1) Memory management in the network subsystem: The alloc skb function does not allow allocating space for the skbuff with arbitrary alignment. Thus, we slightly modified the function and an sk buff structure. A network adapter allocates all skbuffs in expectation of receiving an iscsi packet including a response to a READ command. Therefore, after the network subsystem and the iscsi driver finish processing the protocol headers, iscsi PDU data, that is, request data stored on the disk of the target, show up on page boundaries. This scheme can only be applied to the first TCP packet in the case that one iscsi PDU consists of several TCP packets. This is because the length of the header of the first packet is different from that of the rest. Therefore, after using page remapping for the first TCP packet, our iscsi driver copies data in the rest of the packets. 2) VM cache management: Figure 2 shows how the copy avoidance scheme works after the iscsi driver finishes processing an iscsi PDU header. Figure 2(a) represents data structures just before page remapping. One page structure A describes a page A in the VM system and one page structure B describes a page B used for the space for an skbuff. Figure 2(b) shows the structures after page remapping. The page structure A describes the page B in the VM system. The page structure B and the page A are not used, and the sk buff structure is freed. If necessary, the iscsi driver performs MMU operations to change the address bindings between these pages. This scheme allows the contents of the page that page structure A describes to be updated without copying. Before page remapping, when a process or a subsystem tries to read 4 The word page remapping means changing address bindings by performing MMU operations. Therefore, page remapping may not be the appropriate name for the scheme to avoid copying between buffers in a VM system and those in a network subsystem.

5 VM cache VM cache address_space page frame A address_space page frame A page structure A page structure A socket buffer sk_buff page frame B page frame B page structure B page structure B (a) Before page remapping (b) After page remapping Fig. 2. VM cache and skbuff management. the contents of the page that page structure A describes, the page A is mapped into the kernel virtual address space, and then they see the contents of a page A. After page remapping, the page B is mapped into the kernel virtual address space, and then they see the contents of a page B. To implement our scheme, we made two modifications to the Linux VM system. First, in the modified VM system, the relationship between a page structure and the page that it describes is dynamic. In the Linux VM system, a page structure is allocated for every page frame when the system boots, and the relationship between them is static. We add the index of a page frame to a page structure to enable it to describe any page. This modification increases the overheads of some operations related to memory management. For example, it becomes complicated to know the kernel virtual address into which a page is mapped by using a page structure describing the page. It also becomes complicated to know a page structure describing a page by using the physical address of the page. To keep modifications to a minimum, we implement page remapping by using a special kernel virtual address space, called high memory, which is used for a host system with a large amount of physical memory. Though the original Linux VM system maps most pages into the kernel virtual address space all the time, some pages are temporarily mapped into the high memory address space as the need arises. In the Linux VM system, pages are mapped into the kernel virtual address space starting from 3 GB with Intel 32 bit architecture. Without high memory, therefore, about only 1GB of the main memory at most can be used. The iscsi driver expands this address space and uses some amount of the physical memory exclusively for page remapping. Secondly, in the modified VM system, a page structure holds the reference to another page structure as shown in Figure 2(b). This extension allows these page structures to be restored to their original state when the page is reclaimed. The old page is not freed until the new page is freed, or replaced with another page. D. Another potential solution Another potential solution to avoid copying when reading would be to directly replace a page structure with another. As shown in Figure 2(a), we can directly replace a page structure A with a page structure B. This solution requires extensive modifications to the Linux kernel, because a page structure contains various data structures that subsystems use. Take for example the case where a process is waiting for the contents of a page to be updated by using data structures in the page structure that describes the page. If the page structure is replaced, the operating system must guarantee that the replaced page structure remains untouched until all processes that use it to wait for the contents resume. IV. PERFORMANCE We performed microbenchmarks to get the basic performance characteristics of our iscsi driver. A. Experimental infrastructure Initiator The Dell Precision Workstation 53 uses two 2 GHZ Xeon processors with 1 GB of PC RDRAM main memory, an Intel 8 chipset and an Intel Pro/1 MT Server Adapter connected to a 66-MHz 64-bit PCI slot. The network adapter supports on both the transmitting and receiving sides, jumbo frames, and TSO. The initiator runs a modified version of the Linux kernel version and our iscsi initiator implementation that is based on the source code of the IBM iscsi initiator [16], [19] for the Linux kernel version 2.4 series. The operating system has an 18GB partition on the local disk, and the iscsi driver uses a 73.4 GB partition on the target server.

6 Target The IBM TotalStorage IP Storage i is an iscsi appliance on two 1.13 GHz Pentium III processors with 1 GB of PC133 SDRAM main memory and an Intel Pro/1 F network adapter connected to a 66-MHz 64- bit PCI slot. Our system houses Ultra GB 1 RPM SCSI disks. While the details of the software have not been disclosed, it is believed that the appliance runs the Linux kernel version and the iscsi target implementation that runs in kernel mode. The initiator and the target are connected by an Extreme Summit 7i Gigabit Ethernet switch. The iscsi initiator and target implementations are compatible with the iscsi draft version 8. Though it is not the latest, we believe this issue does not significantly affect the performance under our experimental conditions, i.e., highspeed and low-latency networks. B. Measurement tool We implemented microbenchmarks that run in kernel mode and directly interact with the VM system in order to bypass the VM cache. All iscsi commands request the operation on the same disk block address. This leads to the best cache usage on the target. Therefore, the effect of the target server factors, such as disk performance, on the results is kept to a minimum. To measure the CPU utilization and where the CPU time is spent while running the microbenchmarks, we use OProfile suite [], which is a system-wide profiler for the Linux kernel. The OProfile suite uses Intel Hardware Performance Counters [21] to report detailed execution profiles. C. Configuration We conducted the microbenchmarks with an MTU of 15 bytes (standard Ethernet) or 9 bytes (Ethernet with jumbo frames). The microbenchmarks were run with the following four configurations: zero-copy, and TSO, for which the iscsi driver avoids copying data between the VM cache and skbuffs, and uses and TSO that the network adapter supports; zero-copy and checksum offloading, for which the iscsi driver avoids copying data, and uses ;, for which the iscsi driver copies data in the same way that a straightforward iscsi driver does and uses ; and no optimizations, for which the iscsi driver copies data and does not use or TSO. The microbenchmarks read or write data 1, times with the I/O sizes ranging from 512 bytes to 64 KB. Except in the 64 KB case, one I/O request is converted to one iscsi command. For the 64 KB case, one I/O request is converted to two iscsi commands due to a restriction on the maximum I/O size on the target. Here, we report the averages of five runs for all experiments. D. Results 1) CPU utilization and bandwidth: Figure 3(a) shows the results of CPU utilization during the write microbenchmarks with an MTU of 15 bytes. Except for the sizes ranging from 512 bytes to 124 bytes, the two configurations with which the iscsi driver avoids copying, the zero-copy, checksum offloading and TSO and zero-copy and configurations, yield a lower CPU utilization than the others. It means that copy avoidance improves performance over the large I/O sizes, and that over the small I/O sizes, the cost of sendpage interfaces is slightly higher than the cost of coping data. Though TSO is used for all I/O sizes, it effectively works for I/O sizes greater than 48 bytes. This is because in the block sizes of 512 bytes to 124 bytes TCP segmentation does not happen. That is, one TCP/IP packet can carry all data that constitute one iscsi PDU. Therefore, when the I/O sizes are over 48 bytes, the zero-copy, and TSO configuration yields a lower CPU utilization than the zero-copy and configuration does. But the gain from TSO is not clear. For example, CPU utilization is 29.% in the zero-copy, and TSO configuration and 3.2% in the zero-copy and configuration at 32 KB. As explained in Section III, the gap between the checksum offload and configurations is small. We see the largest gain from copy avoidance at 64 KB. CPU utilization is 3.% in the zero-copy, and TSO, 3.8% in the zero-copy and, 39.4% in the, 42.% in the configuration with the 15-byte MTU. We could not determine the reason why the zero-copy and configuration yields a lower CPU than the zero-copy, and TSO configuration in the block size of 48 bytes. Figure 3(b) shows the results of throughput during the write microbenchmarks with the 15-byte MTU. All configurations provide comparable throughputs. Figures 4(a) and 4(b) show the results of CPU utilization and throughput during the write microbenchmarks with the 9-byte MTU, respectively. These results are similar to those with the 15-byte MTU. But, as expected, CPU utilization with the 9-byte MTU is lower than one with the 15-byte MTU. The throughput with the 9-byte MTU is better than the one with the 15-byte MTU. Figures 5(a) and 5(b) show the results of CPU utilization and throughput during the read microbenchmarks with the 15-byte MTU. As explained in Section III, our scheme to avoid copying is impossible with the 15-byte MTU. In addition, TSO has no effect with regard to the read microbenchmarks. Therefore, only the and configurations are presented. Like the write microbenchmarks, the configuration yields a slightly lower CPU utilization than the configuration. Furthermore, both configurations perform comparably with regard to throughput.

7 1 zero-copy and checksum offload zero-copy, and TSO 7 CPU utilization (%) 4 Throughput (in MB/sec) p zero-copy and zero-copy, and TSO (a) Write CPU utilization (%). (b) Write Throughput (MB/s). Fig. 3. Results of write microbenchmarks with a 15-byte MTU. 1 zero-copy and checksum offload zero-copy, and TSO 1 1 CPU utilization (%) 4 Throughput (in MB/sec) zero-copy and zero-copy, and TSO (a) Write CPU utilization (%). (b) Write Throughput (MB/s). Fig. 4. Results of write microbenchmarks with a 9-byte MTU. Figures 6(a) and 6(b) show the results of CPU utilization and throughput during the read microbenchmarks with the 9-byte MTU. Though the zero-copy and checksum offload configuration is feasible under 496 bytes (the system virtual memory page size) through the combination of page remapping and copying data, its gain is smaller than the one for the 496-byte I/O size and over. Therefore we concentrate on the results over 496 bytes with regard to the zero-copy and checksum offload configuration. While at over 16 KB, both page remapping and copying data are required, at 8192 bytes, the iscsi driver does not need to copy; that is, it only needs to manipulate some data structures for memory management. Therefore, it gets the largest gain from the optimizations at 8192 bytes. However, we cannot see a clear gain. CPU utilization is 21.1% in the zero-copy and configuration and 21.3% in the checksum offloading configuration at 8192 bytes. Moreover, at 496 bytes and over 16 KB, the configuration yields the lower CPU utilization. We investigate the reasons for this behavior in subsequent paragraphs. In the read microbenchmarks, the iscsi driver provides comparable throughput in all configurations with the 9- byte MTU, as it does with the 15-byte MTU. 2) Distribution of CPU time: Next, we investigate where the CPU time is spent while running the microbenchmarks. Table I and II show the distribution of CPU time during the write and read microbenchmarks for the 8192-byte I/O size, respectively. The reason why we choose the I/O size of 8192 bytes is that this size most clearly shows the characteristics of the zero-copy and configuration in the

8 1 1 9 CPU utilization (%) 4 Throughput (in MB/sec) (a) Read CPU utilization (%). (b) Read Throughput (MB/s). Fig. 5. Results of read microbenchmarks with a 15-byte MTU. 1 zero-copy and 1 1 CPU utilization (%) 4 Throughput (in MB/sec) zero-copy and (a) Read CPU utilization (%). (b) Read Throughput (MB/s). Fig. 6. Results of read microbenchmarks with a 9-byte MTU. read microbenchmarks. As mentioned in the previous paragraph, we cannot see a clear gain from TSO. Therefore, we omit the result of the zero-copy, and TOE configuration. There are six types of large overhead: the iscsi driver, copying data and checksums, the SCSI subsystem, the network adapter driver, the TCP/IP stack, and the VM system. The results shows that in the write microbenchmarks, the combination of copy avoidance and decreases the CPU time spent on copying data and calculating checksums. And we also clearly see the effect of a large MTU. It greatly decreases the CPU time spent on the TCP/IP stack and also decreases the time spent on the network adapter driver. The read microbenchmarks perform like the write microbenchmarks. The combination of copy avoidance and decreases the CPU time spent on copying data and calculating checksums. Furthermore, a large MTU decreases the CPU time spent on the TCP/IP stack and on the network adapter driver. The major difference is that the optimization for copy avoidance increases the CPU time spent on the VM system and the TCP stack. The increased overhead of the VM is mainly due to the complexity of the modified VM system. As explained in Section III, in the modified VM system, the relationship between a page structure and the page that it describes is dynamic. This increases the overhead of memory-management operations. Furthermore, the increased overhead of memorymanagement operations increases the CPU time spent on the TCP/IP stack, which often uses these operations. As a result,

9 TABLE I DISTRIBUTION OF CPU TIME DURING THE WRITE MICROBENCHMARKS FOR THE 8192-BYTE I/O SIZE. Percent Time checksum offload zero-copy and checksum offload MTU (byte) Idle iscsi driver SCSI subsystem Copy and Checksum Network adapter driver TCP/IP VM system TABLE II DISTRIBUTION OF CPU TIME DURING THE READ MICROBENCHMARKS FOR THE 8192-BYTE I/O SIZE. Percent Time checksum offload zero-copy and checksum offload MTU (byte) Idle n/a iscsi driver n/a 2.12 SCSI subsystem n/a 1.29 Copy and Checksum n/a.16 Network adapter driver n/a 2.18 TCP/IP n/a 4.79 VM system n/a 4.89 the copy avoidance s performance gain is negated. These investigations also give some ideas on how much performance the iscsi protocol provides with specialized adapters. The important observation is that the overhead of processing the iscsi protocol and the SCSI subsystem consumes a relatively small amount of CPU time. For example, during the write microbenchmarks in the zero-copy and checksum offloading with the 15-byte MTU, it accounts for only 8.9% of the processing time in which the CPU is not idle. The largest source of overhead is the TCP/IP protocol. It accounts for 4.8% of the total processing time. V. CONCLUSION This paper presents an efficient implementation of an iscsi initiator with a commodity Gigabit Ethernet adapter, and experimentally analyzed its performance. Our implementation of an iscsi initiator avoids copying between the VM cache and buffers in the network subsystem by using page remapping, and exploits features that commodity Gigabit Ethernet adapters support, i.e.,, jumbo frames, and TCP segmentation offloading. Our analysis of writing showed that the combination of copy avoidance and reduces CPU utilization from 39.4% to 3.8% with an I/O size of 64 KB in our microbenchmarks as compared with a straightforward iscsi driver copying data. In addition, the copy avoidance technique that we used can be easily integrated into the Linux kernel by using existing interfaces. However, we cannot see a clear gain from TCP segmentation offloading. With regards to reading, the copy avoidance technique requires modifications to the Linux kernel. Furthermore, the overhead due to our modifications to the VM system negates any gain in performance that could be had from copy avoidance. Our experiments show that the CPU is not the performance bottleneck in all configurations and the iscsi driver provides comparable throughput with each MTU. The quantitative analysis shows that the overhead relative to the iscsi protocol and the SCSI subsystem consumes a relatively small amount of CPU time (8.9% of the total processing time during the write microbenchmarks with an MTU of 15 bytes) and the largest overhead comes from processing the TCP/IP protocol with standard IEEE 2.3 Ethernet frame sizes. REFERENCES [1] J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, and E. Zeidner, IPS Internet Draft : iscsi, January 3. [2] P. Sarkar, S. Uttamchandani, and K. Voruganti, Storage over IP: When Does Hardware Support Help? in the Conference on File and Storage Technologies (FAST 3). San Francisco, CA: USENIX, January 3, pp

10 [3] S. Aiken, D. Grunwald, A. R. Pleszkun, and J. Willek, A Performance Analysis of the iscsi Protocol, in th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies. San Diego, California: IEEE, April 3, pp [4] W. T. Ng, B. Hillyer, E. Shriver, E. Gabber, and B. Özden, Obtaining High Performance for Storage Outsourcing, in Conference on File and Storage Technologies. Monterey, CA: Usenix, January 2, pp [5] J. S. Chase, A. J. Gallatin, and K. G. Yocum, End-System Optimizations for High-Speed TCP, IEEE Communications, vol. 39, no. 4, pp , 1. [6] Hsiao-keng Jerry Chu, Zero-Copy TCP in Solaris, in the USENIX 1996 Annual Technical Conference, San Diego, CA, January 1996, pp [7] V. S. Pai, P. Druschel, and W. Zwaenepoel, IO-Lite: A Unified I/O Buffering and Caching System, ACM Transactions on Computer Systems, vol. 18, no. 1, pp ,. [8] J. C. Brustoloni and P. Steenkiste, Interoperation of Copy Avoidance in Network and File I/O, in the INFOCOM 99 Conference. New York: IEEE, March 1999, pp [9] S. Pakin, M. Lauria, and A. Chien, High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet, in Supercomputing 95, San Diego, CA, December [1] D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd, The Virtual Interface Architecture, IEEE micro, vol. 18, no. 2, pp , March/April [11] T. von Eicken, A. Basu, V. Buch, and W. Vogels, U-Net: A User- Level Network Interface for Parallel and Distributed Computing, in the Fifteenth ACM Symposium on Operating System Principles, December 1995, pp [12] M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. Talpey, and M. Wittle, The Direct Access File System, in 2nd USENIX Conference on File and Storage Technologies, San Francisco, CA, March 3, pp [13] Intel iscsi Reference Implementation, intel-iscsi/. [14] InterOperability Laboratory, [15] Cisco iscsi initiator Implementation, linux-iscsi/. [16] IBM iscsi Initiator Implementation, developerworks/projects/naslib/. [17] Open Source Initiative (OSI), [18] K. Z. Meth and J. Satran, Design of the iscsi Protocol, in th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies. San Diego, California: IEEE, April 3, pp [19] K. Z. Meth, iscsi Initiator Design and Implementation Experience, in Tenth NASA Goddard Conference on Mass Storage Systems and Technologies Nineteenth IEEE Symposium on Mass Storage Systems. Maryland, USA: IEEE, April 2. [] OProfile - A System Profiler for Linux, [21] Intel, The IA-32 Intel Architecture Software Developer s Manual, Volume 3: System Programming Guide, 3.

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card In-Su Yoon 1, Sang-Hwa Chung 1, Ben Lee 2, and Hyuk-Chul Kwon 1 1 Pusan National University School of Electrical and Computer

More information

Online Remote Data Backup for iscsi-based Storage Systems

Online Remote Data Backup for iscsi-based Storage Systems Online Remote Data Backup for iscsi-based Storage Systems Dan Zhou, Li Ou, Xubin (Ben) He Department of Electrical and Computer Engineering Tennessee Technological University Cookeville, TN 38505, USA

More information

End-System Optimizations for High-Speed TCP

End-System Optimizations for High-Speed TCP End-System Optimizations for High-Speed TCP Jeffrey S. Chase, Andrew J. Gallatin, and Kenneth G. Yocum Department of Computer Science Duke University Durham, NC 27708-0129 chase, grant @cs.duke.edu, gallatin@freebsd.org

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti Abstract 1 Introduction

IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti Abstract 1 Introduction _ IP Storage: The Challenge Ahead Prasenjit Sarkar, Kaladhar Voruganti IBM Almaden Research Center San Jose, CA 95120 {psarkar,kaladhar}@almaden.ibm.com tel +1-408-927-1417 fax +1-408-927-3497 Abstract

More information

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Internet SCSI (iscsi)

More information

Collecting Packet Traces at High Speed

Collecting Packet Traces at High Speed Collecting Packet Traces at High Speed Gorka Aguirre Cascallana Universidad Pública de Navarra Depto. de Automatica y Computacion 31006 Pamplona, Spain aguirre.36047@e.unavarra.es Eduardo Magaña Lizarrondo

More information

Optimizing Network Virtualization in Xen

Optimizing Network Virtualization in Xen Optimizing Network Virtualization in Xen Aravind Menon EPFL, Lausanne aravind.menon@epfl.ch Alan L. Cox Rice University, Houston alc@cs.rice.edu Willy Zwaenepoel EPFL, Lausanne willy.zwaenepoel@epfl.ch

More information

A Packet Forwarding Method for the ISCSI Virtualization Switch

A Packet Forwarding Method for the ISCSI Virtualization Switch Fourth International Workshop on Storage Network Architecture and Parallel I/Os A Packet Forwarding Method for the ISCSI Virtualization Switch Yi-Cheng Chung a, Stanley Lee b Network & Communications Technology,

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

Optimizing Network Virtualization in Xen

Optimizing Network Virtualization in Xen Optimizing Network Virtualization in Xen Aravind Menon EPFL, Switzerland Alan L. Cox Rice university, Houston Willy Zwaenepoel EPFL, Switzerland Abstract In this paper, we propose and evaluate three techniques

More information

A Performance Analysis of the iscsi Protocol

A Performance Analysis of the iscsi Protocol A Performance Analysis of the iscsi Protocol Stephen Aiken aikens@cs.colorado.edu Dirk Grunwald grunwald@cs.colorado.edu Jesse Willeke willeke@schof.colorado.edu Andrew R. Pleszkun arp@boulder.colorado.edu

More information

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.

More information

Storage and Print Solutions for SOHOs

Storage and Print Solutions for SOHOs 1 Storage and Print Solutions for SOHOs Rajiv Mathews, Ravi Shankar T, Sriram P, Ranjani Parthasarathi Department Of Computer Science & Engineering, College Of Engineering, Guindy, Anna University, Chennai,

More information

Networking Driver Performance and Measurement - e1000 A Case Study

Networking Driver Performance and Measurement - e1000 A Case Study Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation john.ronciak@intel.com Ganesh Venkatesan Intel Corporation ganesh.venkatesan@intel.com Jesse Brandeburg

More information

New!! - Higher performance for Windows and UNIX environments

New!! - Higher performance for Windows and UNIX environments New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)

More information

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller

More information

IP SAN Best Practices

IP SAN Best Practices IP SAN Best Practices A Dell Technical White Paper PowerVault MD3200i Storage Arrays THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES.

More information

Chapter 13 Selected Storage Systems and Interface

Chapter 13 Selected Storage Systems and Interface Chapter 13 Selected Storage Systems and Interface Chapter 13 Objectives Appreciate the role of enterprise storage as a distinct architectural entity. Expand upon basic I/O concepts to include storage protocols.

More information

Presentation of Diagnosing performance overheads in the Xen virtual machine environment

Presentation of Diagnosing performance overheads in the Xen virtual machine environment Presentation of Diagnosing performance overheads in the Xen virtual machine environment September 26, 2005 Framework Using to fix the Network Anomaly Xen Network Performance Test Using Outline 1 Introduction

More information

Wireless ATA: A New Data Transport Protocol for Wireless Storage

Wireless ATA: A New Data Transport Protocol for Wireless Storage Wireless ATA: A New Data Transport Protocol for Wireless Storage Serdar Ozler and Ibrahim Korpeoglu Department of Computer Engineering, Bilkent University, 06800 Bilkent, Ankara, Turkey {ozler, korpe}@cs.bilkent.edu.tr

More information

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized

More information

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality White Paper Broadcom Ethernet Network Controller Enhanced Virtualization Functionality Advancements in VMware virtualization technology coupled with the increasing processing capability of hardware platforms

More information

Fibre Channel over Ethernet in the Data Center: An Introduction

Fibre Channel over Ethernet in the Data Center: An Introduction Fibre Channel over Ethernet in the Data Center: An Introduction Introduction Fibre Channel over Ethernet (FCoE) is a newly proposed standard that is being developed by INCITS T11. The FCoE protocol specification

More information

Building Enterprise-Class Storage Using 40GbE

Building Enterprise-Class Storage Using 40GbE Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance

More information

Storage Solutions Overview. Benefits of iscsi Implementation. Abstract

Storage Solutions Overview. Benefits of iscsi Implementation. Abstract Storage Solutions Overview Benefits of iscsi Implementation Aberdeen LLC. Charles D. Jansen Published: December 2004 Abstract As storage demands continue to increase and become more complex, businesses

More information

PERFORMANCE TUNING ORACLE RAC ON LINUX

PERFORMANCE TUNING ORACLE RAC ON LINUX PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database

More information

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain

More information

VMWARE WHITE PAPER 1

VMWARE WHITE PAPER 1 1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the

More information

PCI Express Overview. And, by the way, they need to do it in less time.

PCI Express Overview. And, by the way, they need to do it in less time. PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into

More information

Doubling the I/O Performance of VMware vsphere 4.1

Doubling the I/O Performance of VMware vsphere 4.1 White Paper Doubling the I/O Performance of VMware vsphere 4.1 with Broadcom 10GbE iscsi HBA Technology This document describes the doubling of the I/O performance of vsphere 4.1 by using Broadcom 10GbE

More information

iscsi Security ELEN 689 Network Security John Price Peter Rega

iscsi Security ELEN 689 Network Security John Price Peter Rega iscsi Security ELEN 689 Network Security John Price Peter Rega Outline iscsi Basics iscsi and NAS Differences iscsi Security Current Vulnerabilities References iscsi Basics Internet Small Computer Storage

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

The proliferation of the raw processing

The proliferation of the raw processing TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Study and Performance Analysis of IP-based Storage System

Study and Performance Analysis of IP-based Storage System Study and Performance Analysis of IP-based Storage System Zhang Jizheng, Yang Bo, Guo Jing, Jia Huibo Optical Memory National Engineering Research Center (OMNERC), Tsinghua University, Beijing, China 100084

More information

iscsi: Accelerating the Transition to Network Storage

iscsi: Accelerating the Transition to Network Storage iscsi: Accelerating the Transition to Network Storage David Dale April 2003 TR-3241 WHITE PAPER Network Appliance technology and expertise solve a wide range of data storage challenges for organizations,

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

Deployments and Tests in an iscsi SAN

Deployments and Tests in an iscsi SAN Deployments and Tests in an iscsi SAN SQL Server Technical Article Writer: Jerome Halmans, Microsoft Corp. Technical Reviewers: Eric Schott, EqualLogic, Inc. Kevin Farlee, Microsoft Corp. Darren Miller,

More information

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters W h i t e p a p e r Top Ten reasons to use Emulex OneConnect iscsi adapters Internet Small Computer System Interface (iscsi) storage has typically been viewed as a good option for small and medium sized

More information

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol

More information

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features UDC 621.395.31:681.3 High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features VTsuneo Katsuyama VAkira Hakata VMasafumi Katoh VAkira Takeyama (Manuscript received February 27, 2001)

More information

VTrak 15200 SATA RAID Storage System

VTrak 15200 SATA RAID Storage System Page 1 15-Drive Supports over 5 TB of reliable, low-cost, high performance storage 15200 Product Highlights First to deliver a full HW iscsi solution with SATA drives - Lower CPU utilization - Higher data

More information

Gigabit Ethernet Design

Gigabit Ethernet Design Gigabit Ethernet Design Laura Jeanne Knapp Network Consultant 1-919-254-8801 laura@lauraknapp.com www.lauraknapp.com Tom Hadley Network Consultant 1-919-301-3052 tmhadley@us.ibm.com HSEdes_ 010 ed and

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

High Speed I/O Server Computing with InfiniBand

High Speed I/O Server Computing with InfiniBand High Speed I/O Server Computing with InfiniBand José Luís Gonçalves Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal zeluis@ipb.pt Abstract: High-speed server computing heavily relies on

More information

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment

More information

Linux NIC and iscsi Performance over 40GbE

Linux NIC and iscsi Performance over 40GbE Linux NIC and iscsi Performance over 4GbE Chelsio T8-CR vs. Intel Fortville XL71 Executive Summary This paper presents NIC and iscsi performance results comparing Chelsio s T8-CR and Intel s latest XL71

More information

Introduction to MPIO, MCS, Trunking, and LACP

Introduction to MPIO, MCS, Trunking, and LACP Introduction to MPIO, MCS, Trunking, and LACP Sam Lee Version 1.0 (JAN, 2010) - 1 - QSAN Technology, Inc. http://www.qsantechnology.com White Paper# QWP201002-P210C lntroduction Many users confuse the

More information

Optimizing Large Arrays with StoneFly Storage Concentrators

Optimizing Large Arrays with StoneFly Storage Concentrators Optimizing Large Arrays with StoneFly Storage Concentrators All trademark names are the property of their respective companies. This publication contains opinions of which are subject to change from time

More information

Virtualization. Dr. Yingwu Zhu

Virtualization. Dr. Yingwu Zhu Virtualization Dr. Yingwu Zhu What is virtualization? Virtualization allows one computer to do the job of multiple computers. Virtual environments let one computer host multiple operating systems at the

More information

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used

More information

Using High Availability Technologies Lesson 12

Using High Availability Technologies Lesson 12 Using High Availability Technologies Lesson 12 Skills Matrix Technology Skill Objective Domain Objective # Using Virtualization Configure Windows Server Hyper-V and virtual machines 1.3 What Is High Availability?

More information

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser Network Performance Optimisation and Load Balancing Wulf Thannhaeuser 1 Network Performance Optimisation 2 Network Optimisation: Where? Fixed latency 4.0 µs Variable latency

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

NAS Backup Strategies

NAS Backup Strategies NAS Backup Strategies Jeff Bain Hewlett-Packard Network Storage Solutions Organization Automated Storage 700 71st Ave. Greeley, CO 80634 jeff_bain@hp.com Page 1 NAS Backup Strategies NAS: A Changing Definition

More information

NAS or iscsi? White Paper 2007. Selecting a storage system. www.fusionstor.com. Copyright 2007 Fusionstor. No.1

NAS or iscsi? White Paper 2007. Selecting a storage system. www.fusionstor.com. Copyright 2007 Fusionstor. No.1 NAS or iscsi? Selecting a storage system White Paper 2007 Copyright 2007 Fusionstor www.fusionstor.com No.1 2007 Fusionstor Inc.. All rights reserved. Fusionstor is a registered trademark. All brand names

More information

IP SAN: Low on Fibre

IP SAN: Low on Fibre IP SAN: Low on Fibre Smriti Bhagat CS Dept, Rutgers University Abstract The era of Fibre Channel interoperability is slowly dawning. In a world where Internet Protocol (IP) dominates local and wide area

More information

3G Converged-NICs A Platform for Server I/O to Converged Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

ADVANCED NETWORK CONFIGURATION GUIDE

ADVANCED NETWORK CONFIGURATION GUIDE White Paper ADVANCED NETWORK CONFIGURATION GUIDE CONTENTS Introduction 1 Terminology 1 VLAN configuration 2 NIC Bonding configuration 3 Jumbo frame configuration 4 Other I/O high availability options 4

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

IP SAN BEST PRACTICES

IP SAN BEST PRACTICES IP SAN BEST PRACTICES PowerVault MD3000i Storage Array www.dell.com/md3000i TABLE OF CONTENTS Table of Contents INTRODUCTION... 3 OVERVIEW ISCSI... 3 IP SAN DESIGN... 4 BEST PRACTICE - IMPLEMENTATION...

More information

SCSI vs. Fibre Channel White Paper

SCSI vs. Fibre Channel White Paper SCSI vs. Fibre Channel White Paper 08/27/99 SCSI vs. Fibre Channel Over the past decades, computer s industry has seen radical change in key components. Limitations in speed, bandwidth, and distance have

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant Laurak@aesclever.com Applied Expert Systems, Inc. 2011 1 Background

More information

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections Chapter 6 Storage and Other I/O Topics 6.1 Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices A Dell Technical White Paper Dell Symantec THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND

More information

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi 1 Key Findings Third I/O found

More information

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

IP SAN Fundamentals: An Introduction to IP SANs and iscsi IP SAN Fundamentals: An Introduction to IP SANs and iscsi Updated April 2007 Sun Microsystems, Inc. 2007 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 USA All rights reserved. This

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft Foreword p. xv Preface p. xvii The team that wrote this redbook p. xviii Comments welcome p. xx Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft Technologies

More information

EVOLUTION OF NETWORKED STORAGE

EVOLUTION OF NETWORKED STORAGE EVOLUTION OF NETWORKED STORAGE Sonika Jindal 1, Richa Jindal 2, Rajni 3 1 Lecturer, Deptt of CSE, Shaheed Bhagat Singh College of Engg & Technology, Ferozepur. sonika_manoj@yahoo.com 2 Lecturer, Deptt

More information

Video Surveillance Storage and Verint Nextiva NetApp Video Surveillance Storage Solution

Video Surveillance Storage and Verint Nextiva NetApp Video Surveillance Storage Solution Technical Report Video Surveillance Storage and Verint Nextiva NetApp Video Surveillance Storage Solution Joel W. King, NetApp September 2012 TR-4110 TABLE OF CONTENTS 1 Executive Summary... 3 1.1 Overview...

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците Why SAN? Business demands have created the following challenges for storage solutions: Highly available and easily

More information

Optimizing LTO Backup Performance

Optimizing LTO Backup Performance Optimizing LTO Backup Performance July 19, 2011 Written by: Ash McCarty Contributors: Cedrick Burton Bob Dawson Vang Nguyen Richard Snook Table of Contents 1.0 Introduction... 3 2.0 Host System Configuration...

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

White Paper Technology Review

White Paper Technology Review White Paper Technology Review iscsi- Internet Small Computer System Interface Author: TULSI GANGA COMPLEX, 19-C, VIDHAN SABHA MARG, LUCKNOW 226001 Uttar Pradesh, India March 2004 Copyright 2004 Tata Consultancy

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components

More information

Chapter 12: Mass-Storage Systems

Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management RAID Structure

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Network Virtualization for Large-Scale Data Centers

Network Virtualization for Large-Scale Data Centers Network Virtualization for Large-Scale Data Centers Tatsuhiro Ando Osamu Shimokuni Katsuhito Asano The growing use of cloud technology by large enterprises to support their business continuity planning

More information

White Paper. Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning

White Paper. Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning White Paper Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning May 2005 Introduction It is no surprise that the rapid growth of data within enterprise networks is

More information

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

How To Build A Clustered Storage Area Network (Csan) From Power All Networks Power-All Networks Clustered Storage Area Network: A scalable, fault-tolerant, high-performance storage system. Power-All Networks Ltd Abstract: Today's network-oriented computing environments require

More information

IMPLEMENTATION AND COMPARISON OF ISCSI OVER RDMA ETHAN BURNS

IMPLEMENTATION AND COMPARISON OF ISCSI OVER RDMA ETHAN BURNS IMPLEMENTATION AND COMPARISON OF ISCSI OVER RDMA BY ETHAN BURNS B.S., University of New Hampshire (2006) THESIS Submitted to the University of New Hampshire in Partial Fulfillment of the Requirements for

More information

Boosting Data Transfer with TCP Offload Engine Technology

Boosting Data Transfer with TCP Offload Engine Technology Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,

More information

How To Test A Microsoft Vxworks Vx Works 2.2.2 (Vxworks) And Vxwork 2.4.2-2.4 (Vkworks) (Powerpc) (Vzworks)

How To Test A Microsoft Vxworks Vx Works 2.2.2 (Vxworks) And Vxwork 2.4.2-2.4 (Vkworks) (Powerpc) (Vzworks) DSS NETWORKS, INC. The Gigabit Experts GigMAC PMC/PMC-X and PCI/PCI-X Cards GigPMCX-Switch Cards GigPCI-Express Switch Cards GigCPCI-3U Card Family Release Notes OEM Developer Kit and Drivers Document

More information

IBM ^ xseries ServeRAID Technology

IBM ^ xseries ServeRAID Technology IBM ^ xseries ServeRAID Technology Reliability through RAID technology Executive Summary: t long ago, business-critical computing on industry-standard platforms was unheard of. Proprietary systems were

More information

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Jose Renato Santos Yoshio Turner G.(John) Janakiraman Ian Pratt Hewlett Packard Laboratories, Palo Alto, CA University of

More information

End-to-end Data integrity Protection in Storage Systems

End-to-end Data integrity Protection in Storage Systems End-to-end Data integrity Protection in Storage Systems Issue V1.1 Date 2013-11-20 HUAWEI TECHNOLOGIES CO., LTD. 2013. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION v 1.0/Updated APRIl 2012 Table of Contents Introduction.... 3 Storage Protocol Comparison Table....4 Conclusion...10 About the

More information

Increasing Web Server Throughput with Network Interface Data Caching

Increasing Web Server Throughput with Network Interface Data Caching Increasing Web Server Throughput with Network Interface Data Caching Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Computer Systems Laboratory Rice University Houston, TX 77005 hykim, vijaypai, rixner

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information