Comparing and Evaluating epoll, select, and poll Event Mechanisms
|
|
|
- Ezra Harper
- 10 years ago
- Views:
Transcription
1 Comparing and Evaluating epoll, select, and poll Event Mechanisms Louay Gammo, Tim Brecht, Amol Shukla, and David Pariag University of Waterloo Abstract This paper uses a high-performance, eventdriven, HTTP server (the µserver) to compare the performance of the select, poll, and epoll event mechanisms. We subject the µserver to a variety of workloads that allow us to expose the relative strengths and weaknesses of each event mechanism. Interestingly, initial results show that the select and poll event mechanisms perform comparably to the epoll event mechanism in the absence of idle connections. Profiling data shows a significant amount of time spent in executing a large number of epoll_ctl system calls. As a result, we examine a variety of techniques for reducing epoll_ctl overhead including edge-triggered notification, and introducing a new system call (epoll_ctlv) that aggregates several epoll_ctl calls into a single call. Our experiments indicate that although these techniques are successful at reducing epoll_ctl overhead, they only improve performance slightly. 1 Introduction The Internet is expanding in size, number of users, and in volume of content, thus it is imperative to be able to support these changes with faster and more efficient HTTP servers. A common problem in HTTP server scalability is how to ensure that the server handles a large number of connections simultaneously without degrading the performance. An event-driven approach is often implemented in high-performance network servers [14] to multiplex a large number of concurrent connections over a few server processes. In eventdriven servers it is important that the server focuses on connections that can be serviced without blocking its main process. An event dispatch mechanism such as select is used to determine the connections on which forward progress can be made without invoking a blocking system call. Many different event dispatch mechanisms have been used and studied in the context of network applications. These mechanisms range from select, poll, /dev/poll, RT signals, and epoll [2, 3, 15, 6, 18, 10, 12, 4]. The epoll event mechanism [18, 10, 12] is designed to scale to larger numbers of connections than select and poll. One of the problems with select and poll is that in a single call they must both inform the kernel of all of the events of interest and obtain new events. This can result in large overheads, particularly in environments with large numbers of connections and relatively few new events occurring. In a fashion similar to that described by Banga et al. [3] epoll separates mechanisms for obtaining events (epoll_wait) from those used to declare and control interest
2 216 Linux Symposium 2004 Volume One in events (epoll_ctl). Further reductions in the number of generated events can be obtained by using edge-triggered epoll semantics. In this mode events are only provided when there is a change in the state of the socket descriptor of interest. For compatibility with the semantics offered by select and poll, epoll also provides level-triggered event mechanisms. To compare the performance of epoll with select and poll, we use the µserver [4, 7] web server. The µserver facilitates comparative analysis of different event dispatch mechanisms within the same code base through command-line parameters. Recently, a highly tuned version of the single process event driven µserver using select has shown promising results that rival the performance of the inkernel TUX web server [4]. Interestingly, in this paper, we found that for some of the workloads considered select and poll perform as well as or slightly better than epoll. One such result is shown in Figure 1. This motivated further investigation with the goal of obtaining a better understanding of epoll s behaviour. In this paper, we describe our experience in trying to determine how to best use epoll, and examine techniques designed to improve its performance. The rest of the paper is organized as follows: In Section 2 we summarize some existing work that led to the development of epoll as a scalable replacement for select. In Section 3 we describe the techniques we have tried to improve epoll s performance. In Section 4 we describe our experimental methodology, including the workloads used in the evaluation. In Section 5 we describe and analyze the results of our experiments. In Section 6 we summarize our findings and outline some ideas for future work. 2 Background and Related Work Event-notification mechanisms have a long history in operating systems research and development, and have been a central issue in many performance studies. These studies have sought to improve mechanisms and interfaces for obtaining information about the state of socket and file descriptors from the operating system [2, 1, 3, 13, 15, 6, 18, 10, 12]. Some of these studies have developed improvements to select, poll and sigwaitinfo by reducing the amount of data copied between the application and kernel. Other studies have reduced the number of events delivered by the kernel, for example, the signal-per-fd scheme proposed by Chandra et al. [6]. Much of the aforementioned work is tracked and discussed on the web site, The C10K Problem [8]. Early work by Banga and Mogul [2] found that despite performing well under laboratory conditions, popular event-driven servers performed poorly under real-world conditions. They demonstrated that the discrepancy is due the inability of the select system call to scale to the large number of simultaneous connections that are found in WAN environments. Subsequent work by Banga et al. [3] sought to improve on select s performance by (among other things) separating the declaration of interest in events from the retrieval of events on that interest set. Event mechanisms like select and poll have traditionally combined these tasks into a single system call. However, this amalgamation requires the server to re-declare its interest set every time it wishes to retrieve events, since the kernel does not remember the interest sets from previous calls. This results in unnecessary data copying between the application and the kernel. The /dev/poll mechanism was adapted from Sun Solaris to Linux by Provos et al. [15],
3 Linux Symposium 2004 Volume One 217 and improved on poll s performance by introducing a new interface that separated the declaration of interest in events from retrieval. Their /dev/poll mechanism further reduced data copying by using a shared memory region to return events to the application. The kqueue event mechanism [9] addressed many of the deficiencies of select and poll for FreeBSD systems. In addition to separating the declaration of interest from retrieval, kqueue allows an application to retrieve events from a variety of sources including file/socket descriptors, signals, AIO completions, file system changes, and changes in process state. The epoll event mechanism [18, 10, 12] investigated in this paper also separates the declaration of interest in events from their retrieval. The epoll_create system call instructs the kernel to create an event data structure that can be used to track events on a number of descriptors. Thereafter, the epoll_ctl call is used to modify interest sets, while the epoll_wait call is used to retrieve events. Another drawback of select and poll is that they perform work that depends on the size of the interest set, rather than the number of events returned. This leads to poor performance when the interest set is much larger than the active set. The epoll mechanisms avoid this pitfall and provide performance that is largely independent of the size of the interest set. 3 Improving epoll Performance Figure 1 in Section 5 shows the throughput obtained when using the µserver with the select, poll, and level-triggered epoll (epoll-lt) mechanisms. In this graph the x-axis shows increasing request rates and the y-axis shows the reply rate as measured by the clients that are inducing the load. This graph shows results for the one-byte workload. These results demonstrate that the µserver with leveltriggered epoll does not perform as well as select under conditions that stress the event mechanisms. This led us to more closely examine these results. Using gprof, we observed that epoll_ctl was responsible for a large percentage of the run-time. As can been seen in Table 1 in Section 5 over 16% of the time is spent in epoll_ctl. The gprof output also indicates (not shown in the table) that epoll_ctl was being called a large number of times because it is called for every state change for each socket descriptor. We examine several approaches designed to reduce the number of epoll_ctl calls. These are outlined in the following paragraphs. The first method uses epoll in an edgetriggered fashion, which requires the µserver to keep track of the current state of the socket descriptor. This is required because with the edge-triggered semantics, events are only received for transitions on the socket descriptor state. For example, once the server reads data from a socket, it needs to keep track of whether or not that socket is still readable, or if it will get another event from epoll_wait indicating that the socket is readable. Similar state information is maintained by the server regarding whether or not the socket can be written. This method is referred to in our graphs and the rest of the paper epoll-et. The second method, which we refer to as epoll2, simply calls epoll_ctl twice per socket descriptor. The first to register with the kernel that the server is interested in read and write events on the socket. The second call occurs when the socket is closed. It is used to tell epoll that we are no longer interested in events on that socket. All events are handled in a level-triggered fashion. Although this approach will reduce the number of epoll_ctl calls, it does have potential disadvantages.
4 218 Linux Symposium 2004 Volume One One disadvantage of the epoll2 method is that because many of the sockets will continue to be readable or writable epoll_wait will return sooner, possibly with events that are currently not of interest to the server. For example, if the server is waiting for a read event on a socket it will not be interested in the fact that the socket is writable until later. Another disadvantage is that these calls return sooner, with fewer events being returned per call, resulting in a larger number of calls. Lastly, because many of the events will not be of interest to the server, the server is required to spend a bit of time to determine if it is or is not interested in each event and in discarding events that are not of interest. The third method uses a new system call named epoll_ctlv. This system call is designed to reduce the overhead of multiple epoll_ctl system calls by aggregating several calls to epoll_ctl into one call to epoll_ctlv. This is achieved by passing an array of epoll events structures to epoll_ctlv, which then calls epoll_ctl for each element of the array. Events are generated in level-triggered fashion. This method is referred to in the figures and the remainder of the paper as epollctlv. We use epoll_ctlv to add socket descriptors to the interest set, and for modifying the interest sets for existing socket descriptors. However, removal of socket descriptors from the interest set is done by explicitly calling epoll_ctl just before the descriptor is closed. We do not aggregate deletion operations because by the time epoll_ctlv is invoked, the µserver has closed the descriptor and the epoll_ctl invoked on that descriptor will fail. The µserver does not attempt to batch the closing of descriptors because it can run out of available file descriptors. Hence, the epollctlv method uses both the epoll_ctlv and the epoll_ctl system calls. Alternatively, we could rely on the close system call to remove the socket descriptor from the interest set (and we did try this). However, this increases the time spent by the µserver in close, and does not alter performance. We verified this empirically and decided to explicitly call epoll_ctl to perform the deletion of descriptors from the epoll interest set. 4 Experimental Environment The experimental environment consists of a single server and eight clients. The server contains dual 2.4 GHz Xeon processors, 1 GB of RAM, a 10,000 rpm SCSI disk, and two one Gigabit Ethernet cards. The clients are identical to the server with the exception of their disks which are EIDE. The server and clients are connected with a 24-port Gigabit switch. To avoid network bottlenecks, the first four clients communicate with the server s first Ethernet card, while the remaining four use a different IP address linked to the second Ethernet card. The server machine runs a slightly modified version of the Linux kernel in uniprocessor mode. 4.1 Workloads This section describes the workloads that we used to evaluate performance of the µserver with the different event notification mechanisms. In all experiments, we generate HTTP loads using httperf [11], an open-loop workload generator that uses connection timeouts to generate loads that can exceed the capacity of the server. Our first workload is based on the widely used SPECweb99 benchmarking suite [17]. We use httperf in conjunction with a SPECweb99 file set and synthetic HTTP traces. Our traces have been carefully generated to recreate the
5 Linux Symposium 2004 Volume One 219 file classes, access patterns, and number of requests issued per (HTTP 1.1) connection that are used in the static portion of SPECweb99. The file set and server caches are sized so that the entire file set fits in the server s cache. This ensures that differences in cache hit rates do not affect performance. Our second workload is called the one-byte workload. In this workload, the clients repeatedly request the same one byte file from the server s cache. We believe that this workload stresses the event dispatch mechanism by minimizing the amount of work that needs to be done by the server in completing a particular request. By reducing the effect of system calls such as read and write, this workload isolates the differences due to the event dispatch mechanisms. To study the scalability of the event dispatch mechanisms as the number of socket descriptors (connections) is increased, we use idleconn, a program that comes as part of the httperf suite. This program maintains a steady number of idle connections to the server (in addition to the active connections maintained by httperf). If any of these connections are closed idleconn immediately re-establishes them. We first examine the behaviour of the event dispatch mechanisms without any idle connections to study scenarios where all of the connections present in a server are active. We then pre-load the server with a number of idle connections and then run experiments. The idle connections are used to increase the number of simultaneous connections in order to simulate a WAN environment. In this paper we present experiments using 10,000 idle connections, our findings with other numbers of idle connections were similar and they are not presented here. 4.2 Server Configuration For all of our experiments, the µserver is run with the same set of configuration parameters except for the event dispatch mechanism. The µserver is configured to use sendfile to take advantage of zero-copy socket I/O while writing replies. We use TCP_CORK in conjunction with sendfile. The same server options are used for all experiments even though the use of TCP_CORK and sendfile may not provide benefits for the one-byte workload when compared with simply using writev. 4.3 Experimental Methodology We measure the throughput of the µserver using different event dispatch mechanisms. In our graphs, each data point is the result of a two minute experiment. Trial and error revealed that two minutes is sufficient for the server to achieve a stable state of operation. A two minute delay is used between consecutive experiments, which allows the TIME_WAIT state on all sockets to be cleared before the subsequent run. All non-essential services are terminated prior to running any experiment. 5 Experimental Results In this section we first compare the throughput achieved when using level-triggered epoll with that observed when using select and poll under both the one-byte and SPECweb99- like workloads with no idle connections. We then examine the effectiveness of the different methods described for reducing the number of epoll_ctl calls under these same workloads. This is followed by a comparison of the performance of the event dispatch mechanisms when the server is pre-loaded with 10,000 idle connections. Finally, we describe the results of experiments in which we tune the
6 220 Linux Symposium 2004 Volume One accept strategy used in conjunction with epoll- LT and epoll-ctlv to further improve their performance. We initially ran the one byte and the SPECweb99-like workloads to compare the performance of the select, poll and leveltriggered epoll mechanisms. As shown in Figure 1 and Figure 2, for both of these workloads select and poll perform as well as epoll-lt. It is important to note that because there are no idle connections for these experiments the number of socket descriptors tracked by each mechanism is not very high. As expected, the gap between epoll-lt and select is more pronounced for the one byte workload because it places more stress on the event dispatch mechanism. We tried to improve the performance of the server by exploring different techniques for using epoll as described in Section 3. The effect of these techniques on the one-byte workload is shown in Figure 3. The graphs in this figure show that for this workload the techniques used to reduce the number of epoll_ctl calls do not provide significant benefits when compared with their level-triggered counterpart (epoll- LT). Additionally, the performance of select and poll is equal to or slightly better than each of the epoll techniques. Note that we omit the line for poll from Figures 3 and 4 because it is nearly identical to the select line. Replies/s select epoll-lt epoll-et epoll-ctlv epoll2 Replies/s select poll epoll-lt Requests/s Figure 3: µserver performance on one byte workload with no idle connections Requests/s Figure 1: µserver performance on one byte workload using select, poll, and epoll-lt Replies/s Requests/s select poll epoll-lt Figure 2: µserver performance on SPECweb99-like workload using select, poll, and epoll-lt We further analyze the results from Figure 3 by profiling the µserver using gprof at the request rate of 22,000 requests per second. Table 1 shows the percentage of time spent in system calls (rows) under the various event dispatch methods (columns). The output for system calls and µserver functions which do not contribute significantly to the total run-time is left out of the table for clarity. If we compare the select and poll columns we see that they have a similar breakdown including spending about 13% of their time indicating to the kernel events of interest and obtaining events. In contrast the epoll-lt, epoll-ctlv, and epoll2 approaches spend about 21 23% of their time on their equivalent functions (epoll_ctl, epoll_ctlv and epoll_wait). Despite these extra overheads the throughputs obtained using the epoll techniques compare favourably with those obtained
7 Linux Symposium 2004 Volume One 221 select epoll-lt epoll-ctlv epoll2 epoll-et poll read close select poll epoll_ctl epoll_wait epoll_ctlv setsockopt accept write fcntl sendfile Table 1: gprof profile data for the µserver under the one-byte workload at 22,000 requests/sec using select and poll. We note that when using select and poll the application requires extra manipulation, copying, and event scanning code that is not required in the epoll case (and does not appear in the gprof data). The results in Table 1 also show that the overhead due to epoll_ctl calls is reduced in epoll-ctlv, epoll2 and epoll-et, when compared with epoll-lt. However, in each case these improvements are offset by increased costs in other portions of the code. The epoll2 technique spends twice as much time in epoll_wait when compared with epoll-lt. With epoll2 the number of calls to epoll_wait is significantly higher, the average number of descriptors returned is lower, and only a very small proportion of the calls (less than 1%) return events that need to be acted upon by the server. On the other hand, when compared with epoll-lt the epoll2 technique spends about 6% less time on epoll_ctl calls so the total amount of time spent dealing with events is comparable with that of epoll-lt. Despite the significant epoll_wait overheads epoll2 performance compares favourably with the other methods on this workload. Using the epoll-ctlv technique, gprof indicates that epoll_ctlv and epoll_ctl combine for a total of 1,949,404 calls compared with 3,947,769 epoll_ctl calls when using epoll-lt. While epoll-ctlv helps to reduce the number of user-kernel boundary crossings, the net result is no better than epoll- LT. The amount of time taken by epoll-ctlv in epoll_ctlv and epoll_ctl system calls is about the same (around 16%) as that spent by level-triggered epoll in invoking epoll_ctl. When comparing the percentage of time epoll- LT and epoll-et spend in epoll_ctl we see that it has been reduced using epoll-et from 16% to 11%. Although the epoll_ctl time has been reduced it does not result in an appreciable improvement in throughput. We also note that about 2% of the run-time (which is not shown in the table) is also spent in the epoll-et case checking, and tracking the state of the request (i.e., whether the server should be reading or writing) and the state of the socket (i.e., whether it is readable or writable). We expect that this can be reduced but that it wouldn t noticeably impact performance. Results for the SPECweb99-like workload are
8 222 Linux Symposium 2004 Volume One shown in Figure 4. Here the graph shows that all techniques produce very similar results with a very slight performance advantage going to epoll-et after the saturation point is reached. The results for the SPECweb99-like workload with 10,000 idle connections are shown in Figure 6. In this case each of the event mechanisms is impacted in a manner similar to that in which they are impacted by idle connections in the one-byte workload case. Replies/s select epoll-lt epoll-et epoll2 Replies/s select poll epoll-et epoll-lt epoll Requests/s Figure 4: µserver performance on SPECweb99-like workload with no idle connections 5.1 Results With Idle Connections We now compare the performance of the event mechanisms with 10,000 idle connections. The idle connections are intended to simulate the presence of larger numbers of simultaneous connections (as might occur in a WAN environment). Thus, the event dispatch mechanism has to keep track of a large number of descriptors even though only a very small portion of them are active. By comparing results in Figures 3 and 5 one can see that the performance of select and poll degrade by up to 79% when the 10,000 idle connections are added. The performance of epoll2 with idle connections suffers similarly to select and poll. In this case, epoll2 suffers from the overheads incurred by making a large number of epoll_wait calls the vast majority of which return events that are not of current interest to the server. Throughput with level-triggered epoll is slightly reduced with the addition of the idle connections while edgetriggered epoll is not impacted Requests/s Figure 5: µserver performance on one byte workload and 10,000 idle connections Replies/s Requests/s select poll epoll-et epoll-lt epoll2 Figure 6: µserver performance on SPECweb99-like workload and 10,000 idle connections 5.2 Tuning Accept Strategy for epoll The µserver s accept strategy has been tuned for use with select. The µserver includes a parameter that controls the number of connections that are accepted consecutively. We call this parameter the accept-limit. Parameter values range from one to infinity (Inf). A value of one limits the server to accepting at most one connection when notified of a pending connection request, while Inf causes the server to consecutively accept all currently pending connections. To this point we have used the accept strategy that was shown to be effective for select by
9 Linux Symposium 2004 Volume One 223 Brecht et al. [4] (i.e., accept-limit is Inf). In order to verify whether the same strategy performs well with the epoll-based methods we explored their performance under different accept strategies. Figure 7 examines the performance of leveltriggered epoll after the accept-limit has been tuned for the one-byte workload (other values were explored but only the best values are shown). Level-triggered epoll with an accept limit of 10 shows a marked improvement over the previous accept-limit of Inf, and now matches the performance of select on this workload. The accept-limit of 10 also improves peak throughput for the epoll-ctlv model by 7%. This gap widens to 32% at 21,000 requests/sec. In fact the best accept strategy for epoll-ctlv fares slightly better than the best accept strategy for select. Replies/s select accept=inf epoll-lt accept=inf 5000 epoll-lt accept=10 epoll-ctlv accept=inf epoll-ctlv accept= Requests/s Figure 7: µserver performance on one byte workload with different accept strategies and no idle connections Varying the accept-limit did not improve the performance of the edge-triggered epoll technique under this workload and it is not shown in the graph. However, we believe that the effects of the accept strategy on the various epoll techniques warrants further study as the efficacy of the strategy may be workload dependent. 6 Discussion In this paper we use a high-performance eventdriven HTTP server, the µserver, to compare and evaluate the performance of select, poll, and epoll event mechanisms. Interestingly, we observe that under some of the workloads examined the throughput obtained using select and poll is as good or slightly better than that obtained with epoll. While these workloads may not utilize representative numbers of simultaneous connections they do stress the event mechanisms being tested. Our results also show that a main source of overhead when using level-triggered epoll is the large number of epoll_ctl calls. We explore techniques which significantly reduce the number of epoll_ctl calls, including the use of edge-triggered events and a system call, epoll_ctlv, which allows the µserver to aggregate large numbers of epoll_ctl calls into a single system call. While these techniques are successful in reducing the number of epoll_ctl calls they do not appear to provide appreciable improvements in performance. As expected, the introduction of idle connections results in dramatic performance degradation when using select and poll, while not noticeably impacting the performance when using epoll. Although it is not clear that the use of idle connections to simulate larger numbers of connections is representative of real workloads, we find that the addition of idle connections does not significantly alter the performance of the edge-triggered and level-triggered epoll mechanisms. The edgetriggered epoll mechanism performs best with the level-triggered epoll mechanism offering performance that is very close to edgetriggered. In the future we plan to re-evaluate some of
10 224 Linux Symposium 2004 Volume One the mechanisms explored in this paper under more representative workloads that include more representative wide area network conditions. The problem with the technique of using idle connections is that the idle connections simply inflate the number of connections without doing any useful work. We plan to explore tools similar to Dummynet [16] and NIST Net [5] in order to more accurately simulate traffic delays, packet loss, and other wide area network traffic characteristics, and to re-examine the performance of Internet servers using different event dispatch mechanisms and a wider variety of workloads. 7 Acknowledgments We gratefully acknowledge Hewlett Packard, the Ontario Research and Development Challenge Fund, and the National Sciences and Engineering Research Council of Canada for financial support for this project. References [1] G. Banga, P. Druschel, and J.C. Mogul. Resource containers: A new facility for resource management in server systems. In Operating Systems Design and Implementation, pages 45 58, [2] G. Banga and J.C. Mogul. Scalable kernel performance for Internet servers under realistic loads. In Proceedings of the 1998 USENIX Annual Technical Conference, New Orleans, LA, [3] G. Banga, J.C. Mogul, and P. Druschel. A scalable and explicit event delivery mechanism for UNIX. In Proceedings of the 1999 USENIX Annual Technical Conference, Monterey, CA, June [4] Tim Brecht, David Pariag, and Louay Gammo. accept()able strategies for improving web server performance. In Proceedings of the 2004 USENIX Annual Technical Conference (to appear), June [5] M. Carson and D. Santay. NIST Net a Linux-based network emulation tool. Computer Communication Review, to appear. [6] A. Chandra and D. Mosberger. Scalability of Linux event-dispatch mechanisms. In Proceedings of the 2001 USENIX Annual Technical Conference, Boston, [7] HP Labs. The userver home page, Available at research/linux/userver. [8] Dan Kegel. The C10K problem, Available at http: // [9] Jonathon Lemon. Kqueue a generic and scalable event notification facility. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, [10] Davide Libenzi. Improving (network) I/O performance. Available at linux-patches/nio-improve. html. [11] D. Mosberger and T. Jin. httperf: A tool for measuring web server performance. In The First Workshop on Internet Server Performance, pages 59 67, Madison, WI, June [12] Shailabh Nagar, Paul Larson, Hanna Linder, and David Stevens. epoll scalability web page. Available at epoll/index.html.
11 Linux Symposium 2004 Volume One 225 [13] M. Ostrowski. A mechanism for scalable event notification and delivery in Linux. Master s thesis, Department of Computer Science, University of Waterloo, November [14] Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel. Flash: An efficient and portable Web server. In Proceedings of the USENIX 1999 Annual Technical Conference, Monterey, CA, June article/pai99flash.html. [15] N. Provos and C. Lever. Scalable network I/O in Linux. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, June [16] Luigi Rizzo. Dummynet: a simple approach to the evaluation of network protocols. ACM Computer Communication Review, 27(1):31 41, edu/rizzo97dummynet.html. [17] Standard Performance Evaluation Corporation. SPECWeb99 Benchmark, Available at specbench.org/osg/web99. [18] David Weekly. /dev/epoll a highspeed Linux kernel patch. Available at
12 226 Linux Symposium 2004 Volume One
13 Proceedings of the Linux Symposium Volume One July 21st 24th, 2004 Ottawa, Ontario Canada
14 Conference Organizers Andrew J. Hutton, Steamballoon, Inc. Stephanie Donovan, Linux Symposium C. Craig Ross, Linux Symposium Review Committee Jes Sorensen, Wild Open Source, Inc. Matt Domsch, Dell Gerrit Huizenga, IBM Matthew Wilcox, Hewlett-Packard Dirk Hohndel, Intel Val Henson, Sun Microsystems Jamal Hadi Salimi, Znyx Andrew Hutton, Steamballoon, Inc. Proceedings Formatting Team John W. Lockhart, Red Hat, Inc. Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights to all as a condition of submission.
Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers
Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Syed Mutahar Aaqib Research Scholar Department of Computer Science & IT IT University of Jammu Lalitsen Sharma Associate Professor
Comparing the Performance of Web Server Architectures
Comparing the Performance of Web Server Architectures David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario,
Development and Evaluation of an Experimental Javabased
Development and Evaluation of an Experimental Javabased Web Server Syed Mutahar Aaqib Department of Computer Science & IT University of Jammu Jammu, India Lalitsen Sharma, PhD. Department of Computer Science
Networking Driver Performance and Measurement - e1000 A Case Study
Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
Web Server Architectures
Web Server Architectures CS 4244: Internet Programming Dr. Eli Tilevich Based on Flash: An Efficient and Portable Web Server, Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel, 1999 Annual Usenix Technical
Flow-based network accounting with Linux
Flow-based network accounting with Linux Harald Welte netfilter core team / hmw-consulting.de / Astaro AG [email protected] Abstract Many networking scenarios require some form of network accounting
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
A Hybrid Web Server Architecture for e-commerce Applications
A Hybrid Web Server Architecture for e-commerce Applications David Carrera, Vicenç Beltran, Jordi Torres and Eduard Ayguadé {dcarrera, vbeltran, torres, eduard}@ac.upc.es European Center for Parallelism
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Comparison of Web Server Architectures: a Measurement Study
Comparison of Web Server Architectures: a Measurement Study Enrico Gregori, IIT-CNR, [email protected] Joint work with Marina Buzzi, Marco Conti and Davide Pagnin Workshop Qualità del Servizio
Scalability of Linux Event-Dispatch Mechanisms
Scalability of Linux Event-Dispatch Mechanisms Abhishek Chandra Department of Computer Science, University of Massachusetts, Amherst [email protected] David Mosberger Hewlett Packard Laboratories,
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
EMC Unified Storage for Microsoft SQL Server 2008
EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content
Advances in Networks, Computing and Communications 6 92 CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content Abstract D.J.Moore and P.S.Dowland
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
Ready Time Observations
VMWARE PERFORMANCE STUDY VMware ESX Server 3 Ready Time Observations VMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified
One Server Per City: C Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu
One Server Per City: C Using TCP for Very Large SIP Servers Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu Goal Answer the following question: How does using TCP affect the scalability and
PARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
Business white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
HP SN1000E 16 Gb Fibre Channel HBA Evaluation
HP SN1000E 16 Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect [email protected] Validating if the workload generated by the load generating tools is applied
PERFORMANCE ANALYSIS OF WEB SERVERS Apache and Microsoft IIS
PERFORMANCE ANALYSIS OF WEB SERVERS Apache and Microsoft IIS Andrew J. Kornecki, Nick Brixius Embry Riddle Aeronautical University, Daytona Beach, FL 32114 Email: [email protected], [email protected] Ozeas
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Performance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
Optimization of Cluster Web Server Scheduling from Site Access Statistics
Optimization of Cluster Web Server Scheduling from Site Access Statistics Nartpong Ampornaramveth, Surasak Sanguanpong Faculty of Computer Engineering, Kasetsart University, Bangkhen Bangkok, Thailand
Agility Database Scalability Testing
Agility Database Scalability Testing V1.6 November 11, 2012 Prepared by on behalf of Table of Contents 1 Introduction... 4 1.1 Brief... 4 2 Scope... 5 3 Test Approach... 6 4 Test environment setup... 7
ext4 online defragmentation
ext4 online defragmentation Takashi Sato NEC Software Tohoku, Ltd. [email protected] Abstract ext4 greatly extends the filesystem size to 1024PB compared to 16TB in ext3, and it is capable of storing
New!! - Higher performance for Windows and UNIX environments
New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
Capacity Planning Guide for Adobe LiveCycle Data Services 2.6
White Paper Capacity Planning Guide for Adobe LiveCycle Data Services 2.6 Create applications that can deliver thousands of messages per second to thousands of end users simultaneously Table of contents
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server
Performance brief for IBM WebSphere Application Server.0 with VMware ESX.0 on HP ProLiant DL0 G server Table of contents Executive summary... WebSphere test configuration... Server information... WebSphere
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Benchmarking Guide. Performance. BlackBerry Enterprise Server for Microsoft Exchange. Version: 5.0 Service Pack: 4
BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 4 Performance Benchmarking Guide Published: 2015-01-13 SWD-20150113132750479 Contents 1 BlackBerry Enterprise Server for Microsoft
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Todd Deshane, Demetrios Dimatos, Gary Hamilton, Madhujith Hapuarachchi, Wenjin Hu, Michael McCabe, Jeanna
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing January 2014 Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul
A Hybrid Web Server Architecture for Secure e-business Web Applications
A Hybrid Web Server Architecture for Secure e-business Web Applications Vicenç Beltran, David Carrera, Jordi Guitart, Jordi Torres, and Eduard Ayguadé European Center for Parallelism of Barcelona (CEPBA),
Performance Analysis of Web based Applications on Single and Multi Core Servers
Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Streaming and Virtual Hosted Desktop Study
White Paper Intel Information Technology Streaming, Virtual Hosted Desktop, Computing Models, Client Virtualization Streaming and Virtual Hosted Desktop Study Benchmarking Results As part of an ongoing
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Microsoft Windows Server 2003 vs. Linux Competitive File Server Performance Comparison
April 2003 1001 Aviation Parkway, Suite 400 Morrisville, NC 27560 919-380-2800 Fax 919-380-2899 320 B Lakeside Drive Foster City, CA 94404 6-513-8000 Fax 6-513-8099 www.veritest.com [email protected] Microsoft
Revisiting Software Zero-Copy for Web-caching Applications with Twin Memory Allocation
Revisiting Software Zero-Copy for Web-caching Applications with Twin Memory Allocation Xiang Song, Jicheng Shi, Haibo Chen, Binyu Zang Institute of Parallel and Distributed Systems, Shanghai Jiao Tong
On Benchmarking Popular File Systems
On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,
Java Bit Torrent Client
Java Bit Torrent Client Hemapani Perera, Eran Chinthaka {hperera, echintha}@cs.indiana.edu Computer Science Department Indiana University Introduction World-wide-web, WWW, is designed to access and download
LCMON Network Traffic Analysis
LCMON Network Traffic Analysis Adam Black Centre for Advanced Internet Architectures, Technical Report 79A Swinburne University of Technology Melbourne, Australia [email protected] Abstract The Swinburne
Comparative Study of Load Testing Tools
Comparative Study of Load Testing Tools Sandeep Bhatti, Raj Kumari Student (ME), Department of Information Technology, University Institute of Engineering & Technology, Punjab University, Chandigarh (U.T.),
Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
AgencyPortal v5.1 Performance Test Summary Table of Contents
AgencyPortal v5.1 Performance Test Summary Table of Contents 1. Testing Approach 2 2. Server Profiles 3 3. Software Profiles 3 4. Server Benchmark Summary 4 4.1 Account Template 4 4.1.1 Response Time 4
Testing the Xen Hypervisor and Linux Virtual Machines
Testing the Xen Hypervisor and Linux Virtual Machines David Barrera IBM Linux Technology Center [email protected] Stephanie Glass IBM Linux Technology Center [email protected] Li Ge IBM Linux Technology
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
Scalability Factors of JMeter In Performance Testing Projects
Scalability Factors of JMeter In Performance Testing Projects Title Scalability Factors for JMeter In Performance Testing Projects Conference STEP-IN Conference Performance Testing 2008, PUNE Author(s)
Online Remote Data Backup for iscsi-based Storage Systems
Online Remote Data Backup for iscsi-based Storage Systems Dan Zhou, Li Ou, Xubin (Ben) He Department of Electrical and Computer Engineering Tennessee Technological University Cookeville, TN 38505, USA
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
Optimizing Network Virtualization in Xen
Optimizing Network Virtualization in Xen Aravind Menon EPFL, Switzerland Alan L. Cox Rice university, Houston Willy Zwaenepoel EPFL, Switzerland Abstract In this paper, we propose and evaluate three techniques
SERVER CLUSTERING TECHNOLOGY & CONCEPT
SERVER CLUSTERING TECHNOLOGY & CONCEPT M00383937, Computer Network, Middlesex University, E mail: [email protected] Abstract Server Cluster is one of the clustering technologies; it is use for
Measuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering
DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top
Serving 4 million page requests an hour with Magento Enterprise
1 Serving 4 million page requests an hour with Magento Enterprise Introduction In order to better understand Magento Enterprise s capacity to serve the needs of some of our larger clients, Session Digital
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program
An Oracle White Paper March 2013. Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite
An Oracle White Paper March 2013 Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite Executive Overview... 1 Introduction... 1 Oracle Load Testing Setup... 2
Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware
Packet Capture in 1-Gigabit Ethernet Environments Using Contemporary Commodity Hardware Fabian Schneider Jörg Wallerich Anja Feldmann {fabian,joerg,anja}@net.t-labs.tu-berlin.de Technische Universtität
HPSA Agent Characterization
HPSA Agent Characterization Product HP Server Automation (SA) Functional Area Managed Server Agent Release 9.0 Page 1 HPSA Agent Characterization Quick Links High-Level Agent Characterization Summary...
A Statistically Customisable Web Benchmarking Tool
Electronic Notes in Theoretical Computer Science 232 (29) 89 99 www.elsevier.com/locate/entcs A Statistically Customisable Web Benchmarking Tool Katja Gilly a,, Carlos Quesada-Granja a,2, Salvador Alcaraz
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2
TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2 1 INTRODUCTION How does one determine server performance and price/performance for an Internet commerce, Ecommerce,
SQL Server Business Intelligence on HP ProLiant DL785 Server
SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly
Using Dynamic Feedback to Optimise Load Balancing Decisions
Using Dynamic Feedback to Optimise Load Balancing Decisions Jeremy Kerr [email protected] Abstract The goal of a network load balancer is to distribute a workload evenly amongst a cluster of
Optimizing Network Virtualization in Xen
Optimizing Network Virtualization in Xen Aravind Menon EPFL, Lausanne [email protected] Alan L. Cox Rice University, Houston [email protected] Willy Zwaenepoel EPFL, Lausanne [email protected]
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
Performance Comparison of Assignment Policies on Cluster-based E-Commerce Servers
Performance Comparison of Assignment Policies on Cluster-based E-Commerce Servers Victoria Ungureanu Department of MSIS Rutgers University, 180 University Ave. Newark, NJ 07102 USA Benjamin Melamed Department
Comparing the Network Performance of Windows File Sharing Environments
Technical Report Comparing the Network Performance of Windows File Sharing Environments Dan Chilton, Srinivas Addanki, NetApp September 2010 TR-3869 EXECUTIVE SUMMARY This technical report presents the
