Experiences parallelizing a Web Server with OpenMP

Size: px
Start display at page:

Download "Experiences parallelizing a Web Server with OpenMP"

Transcription

1 Experiences parallelizing a Web Server with OpenMP Jairo Balart, Alejandro Duran, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, Jesús Labarta CEPBA-IBM Research Institute Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Jordi Girona, 1-3, Barcelona, Spain. {jbalart,aduran,marc,xavim,eduard,jesus}@ac.upc.edu Abstract. Multi threaded web servers are typically parallelized by hand using the pthreads library. OpenMP has rarely been used to parallelize such kind of applications, although we foresee that it can be a great tool for network servers developers. In this paper we compare how easy is to parallelize the Boa web server using OpenMP, compared to a pthreads parallelization, and the performance achieved. We present the results of a parallelization based on OpenMP 2.0, the dynamic sections model and pthreads. 1 Introduction & Motivation OpenMP [1] has successfully been used to parallelize a great number of applications in the scientific domain. Extensive work has been done to fulfill the needs of scientific applications in shared memory environments. The parallelism that appears in numerical applications has significantly influenced the definition of the OpenMP API. Most work distribution schemes are specifically designed to support the main source of parallelism of scientific applications: parallel loops. For synchronization mechanisms programmers can use barrier synchronizations, mutual exclusion and atomic synchronizations. But scientific applications are not the only niche where parallelism can be used to increase the performance of applications. In fact, with new generations of architectures containing multiple cores, parallelism will be exploited in applications with characteristics and needs dramatically different from those of the scientific world. We believe the OpenMP community should start studying these characteristics. This study will either determine if the current OpenMP API suites these new applications or whether it needs changes to support them efficiently. We have studied the feasibility of using OpenMP to parallelize a web server to start exploring the characteristics that will be found in these new applications. Web servers are inherently parallel applications as the different requests are unrelated and free of dependences between them. Thus, the requests can

2 be handled in parallel. Web servers have been traditionally parallelized with threading techniques (mainly pthreads). Our objective is to specify useful extensions if the OpenMP API does not easily support the parallelism found in web servers. We have selected the Boa [2] web server as the platform to be parallelized. Different parallel strategies have been developed. An OpenMP version allowed us to evaluate the programming effort and the resulting performance using the current OpenMP standard. Also, we developed a version that uses the proposed dynamic sections [3] constructions to check their usefulness. Finally, a manual pthread based version was developed to do a comprehensive comparison. All three versions were evaluated against the original version which is single threaded. The structure of the paper follows: section 2 describes the contributions of this paper to the state of the art. Section 3 overviews the structure of the Boa web server. In section 4 we describe our parallelized versions of Boa. Section 5 describes the experiments performed and the results obtained. Section 6 discusses our experiences with the different parallel versions. And finally, in section 7 we present the conclusions of this work and we describe some lines for future study. 2 Related Work Several authors have compared OpenMP versus pthreads. These comparisons have been in scientific applications or comparisons between basic language constructions. Kuhn et al. compared the primitives provided by both models concluding that OpenMP was easier to use but they also pointed out some problems with irregular applications [4]. Lee and Downar compared both languages for a nuclear reactor transient code [5]. They obtained similar performance with both OpenMP and pthreads but OpenMP was easier to use. Breshears and Luong compared both models in the context of a Coastal Ocean Circulation Model [6]. Their conclusion was that OpenMP was easier to use yielding the same performance than pthreads. Dedu et al. compared both models with some algorithms from the artificial intelligence field [7]. They found that although for regular applications OpenMP was easier to use and obtained the same performance than pthreads for irregular applications OpenMP was difficult to use. Some other works have tried to extend OpenMP to be able to cope with irregular applications. Asenjo et al. explored some techniques to deal with pointers and traversal of structures [8]. Shah et al. introduced the workqueueing model [9]. This proposal extends the OpenMP programming model with an alternative work distribution scheme based on the definition of queues of work from where the executing threads extract work. The extension targets algorithms traversing memory and linked data structures. The proposal has been successful but introduces dramatic changes in the OpenMP execution model. We previously proposed to use dynamic sections [3] to minimize the changes of the 2

3 workqueueing model, in which dynamic sections are based. We further extend the semantics of the dynamic sections in this work. Different works have evaluated the usefulness of threaded web servers. For example, Roper et al. compared different web servers concluding that multi threaded servers can outperform traditional event driven servers [10] which are not threaded. Jeong et al. showed in their work [11] that the use of multiple CPUs with dynamic content increases the throughput obtained and reduces the response time. Using multiple CPUs with static content does not increase throughput but response time still decreases. The benefits of using multiple CPUs in SSL enabled web servers were shown by Guitart et al.[12]. Other successful multi threaded servers include Apache [13], SEDA [14] and Flash [15]. All these works used a system thread packages, mainly pthreads. This work, instead, explores the suitability of OpenMP as a language for the development of a parallel web server. 3 The Boa web server The Boa web server architecture is a single threaded event driver HTTP server architecture. This kind of architecture, unlike traditional web servers, does not fork for each incoming connection. Instead, Boa comes with an integrated task scheduler that handles multiple requests concurrently but not in parallel. Boa multiplexes all ongoing requests, trying to maximize the throughput and minimize the response time. The scheduler uses two request queues: the ready queue keeps those requests available for further processing. The blocked queue keeps those requests waiting for any data dependence to be satisfied. Iteratively, the server traverses the ready queue and further processes each request. The server uses a round robin technique to avoid large requests starving other ready requests. Boa logically divides each request in smaller chunks of work and each time a request is processed a single chunk is consumed. Figure 1 shows a simplified code that processes the requests from the ready queue. After a request is processed 1. It is kept in the ready queue because it should be further processed ( i.e. it has more chunks and all data are available ). 2. It is moved to the blocked queue because the server detected an unsatisfied dependence ( e.g. it needs to read data from a socket ). 3. It is freed because the last chunk of the request was consumed. The server traverses the blocked queue and for each request it checks if the request dependences are satisfied using the information it collects with the select system call. When a request has no further dependences it is moved again to the ready queue. Figure 2 shows the structure of the main loop of the server. This loop is infinite and in each iteration the server 1. processes any pending signal. 3

4 1 for each request in the ready queue 2 { 3 update_time; 4 result = process_step(request); 5 accept new requests (if any); 6 if ( result == BLOCK ) block(request); 7 else if ( result == FINISHED ) free(request); 8 else keep it in the queue 9 } Fig. 1. Request processing loop pseudo code 2. traverses the blocked queue to check the dependences of blocked requests. 3. establishes pending new connections using the accept system call. 4. traverses the ready queue to process more chunks of the unblocked requests. 5. calls select to obtain information about new connections and the status of incoming and outgoing data. 1 while (1) 2 { 3 process signals (if any) 4 move requests from blocked to ready using select result 5 accept new connections (if any) 6 process requests in the ready queue 7 select system call 8 } Fig. 2. Main loop pseudo code Boa tries to reduce the number of issued system calls by mmaping local files into the server memory. A cache of open files (i.e active maps), which it is checked each time a new file is requested, avoids mmaping twice the same file. The server performs all input/output with non-blocking system calls to ensure that no single request blocks the processing of others that are ready. 4 Parallelizing Boa The main source of parallelism in the Boa web server is the possibility of overlapping the computations related to different requests. As explained in section 3 the requests are placed in the ready and blocked queues and the server iteratively traverses these queues. Parallel processing is possible by processing each element of the queues in parallel. The server can also do different tasks in parallel (e.g. accepting new requests and processing new requests ). All the versions described exploit these sources of parallelism. Different points in the server require serialization. First, modification of global variables (e.g. the number of active connections ). Second, manipulations 4

5 of the ready and blocked queues. Third, acceptance of new connections. Fourth, access to the cache of open files. And last, write access to the server log files to avoid mixing the output of different threads. The original version uses a lot of static variables insides functions. These variables were converted to extra parameters of the function they were in. Another possible option was to use per thread variables ( e.g. threadprivate in OpenMP ). We have developed three parallel versions: a pthreads version, a pure OpenMP version and a version using dynamic sections which are non standard. 4.1 Pthreads parallel version The pthread parallel version exploits the possibility of processing different requests in parallel, as they are unrelated. The parallelization uses a producer consumer approach. One thread executes all the tasks in the main Boa loop, described in Figure 2, except requests processing. This thread is the producer of new ready requests. The remaining threads consume the ready requests and process them as explained in section 3. Figure 3 shows the code the consumer threads execute. The pthread version uses the same round robin mechanism of the serial version. So, each time a thread extracts a request a single chunk is consumed and the request may be queued again to the ready queue. This version uses different mutex locks to protect accesses to the ready queue, accesses to the blocked queue, accesses to global variables, and writing to the server log files. The cache of open files is also protected by a mutex lock per entry, to maximize concurrency, and a global mutex lock for global cache variables. 1 for ( ; ; ) { 2 while( not pending requests ); 3 pthread_mutex_lock(&ready_lock); 4 if ( pending requests ) { 5 req = dequeue(request_ready); 6 } 7 pthread_mutex_unlock(&ready_lock); 8 if ( req ) 9 process_request(req); 10 } Fig. 3. Code for thread consumers in the pthread Boa version. 4.2 OpenMP standard parallelization For our OpenMP parallelization we targeted the request processing loop Figure 1. We wanted to distribute all the requests in the ready queue among the available threads by using a workshare. The number of requests in the queue can vary during its traversal (e.g. if a request is free ). Because in OpenMP all 5

6 threads must see the same iterations we splitted the loop in two new loops: one does not remove requests the queue while the other does. In the first new loop, the server processes each request in the ready queue. The result of each processing is stored in a new field of the request structure. This loop was parallelized using a parallel construct and we used a single workshare to distribute the different iterations. Using the result from the first loop the server modifies the queues in the second loop which must be done single-threaded. Figure 4 shows the OpenMP parallelization of the new loops. When new requests are accepted they are added to the beginning of the queue. Before the server accepts any request all threads grab the head of the queue. This guarantees they will traverse the same elements. 1 #pragma omp parallel 2 { 3 4 get head of ready queue 5 #pragma omp barrier 6 for each request in the ready queue 7 { 8 #pragma omp master 9 update time; 10 #pragma omp single nowait 11 request.result = process step(request); 12 #pragma omp master 13 accept new requests (if any); 14 } 15 } for each request in the ready queue 18 { 19 if ( request.result == BLOCK ) block(request); 20 else if ( request.result == FINISHED ) free(request); 21 else keep it in the queue 22 } Fig. 4. OpenMP request processing loop pseudo code Access to the cache of open files was protected with a critical section for the global cache variables and an OpenMP lock per cache entry. We used another critical construction to guarantee correct access to the log files. Several critical sections protect access to Boa global variables. 4.3 Dynamic sections parallelization The previous presented parallelizations were in previous section were mainly possible because the serial version had already embedded a complex code that dealt with request blocking and their scheduling ( i.e. the ready queue ). Our question now is: could OpenMP make this work easy to the programmer? The available parallelism can be seen as a collection of tasks. We have used the dynamic sections proposed extension to express this parallelism easily. Under 6

7 this model a single thread is in charge of performing the serial work ( accepting requests, extracting them from the blocked queue,... ) while the remaining threads execute the parallel tasks that are created (i.e. a dynamic section). In this version, we have removed part of the integrated schedule: the ready queue has been removed. Instead of queueing requests in the ready queue now the threads create new dynamic sections. A new dynamic section is created When a new request is accepted, a new section is created for the first chunk of the request. When a request has completed a chunk, if it was not the last, a new section is created for the next chunk. This dynamic section is created inside another. We have extended the original model to allow nesting of SECTION constructs. When a request is unblocked, because their dependences are fulfilled, a new section is created for the next chunk. 1 #pragma omp parallel 2 #pragma omp sections dynamic 3 while (1) 4 { 5 process signals (if any) 6 foreach request from the blocked queue { 7 if ( request dependences are met ) { 8 extract from the blocked queue 9 #pragma omp section captureprivate(request) 10 serve_request(request) 11 } 12 } 13 if ( new connection ) { 14 accept it 15 #pragma omp section captureprivate(new connection) 16 server_request (new connection) 17 } 18 select system call 19 } Fig. 5. Main loop pseudo code with dynamic sections Figure 5 shows our parallelization of the code of the main loop using the dynamic sections. As in the previous version, several critical constructions protect accesses to the open files cache, access to the global variables and access to the server log files. 5 Evaluation 5.1 Environment For our experiments we used a 4-way Intel Xeon at 1.4GHz with 2GB of RAM to run the web server and a 2-way Intel Xeon at 2.4 GHz with 2GB of RAM to 7

8 run the benchmark client. All the machines were running a 2.6 Linux kernel. The network that connected the machines was a switched Gigabit network. 5.2 Workload generator We used Httperf[16] to generate the different workloads for the experiments. This tool allows the creation of a continuous flow of HTTP requests to the server machine. The tool accepts as one of its parameters the number of clients per second. For each client, it opens a session with the server through a persistent HTTP connection. Then a series of requests are issued by the client, some of them pipelined, some spaced by a think time. Another parameter of Httperf is the sessions database from where clients get the requests they ask for and the think times to wait. We have used a database extracted from the Surge[17] workload generator. The scenario produced by Surge is a static content workload characterized by short session lengths and low computational costs for each request serviced. The Surge distribution is based on a model developed from the observation of real web server logs. 5.3 Experiments We evaluated all different versions of the Boa web server using the Surge workload with different loads of clients. These load configurations ranged from a low load of clients (10 per second) to heavy load of clients (800 clients per second). In the following plots, we labeled the different Boa versions as follows: original boa refers to the unmodified single thread Boa server. boa-pthreads refers to the parallel version that uses pthreads. boa-omp refers to the parallel version that uses standard OpenMP constructions. boa-dsections refers to the parallel version that uses dynamic sections. All parallel versions were run with 2 and 4 threads. Figure 6 shows for each load of clients the throughput obtained by each Boa version. All versions, except the boa-omp version, obtained a similar throughput up to a workload of 700 clients per second. The boa-omp version was outperformed because the server did not run as much time in parallel as the other versions do. Doubling the number of threads in the parallel versions did not result in a noticeable increase of throughput because Surge is not CPU intensive workload as it works with static content. With 700 clients per second the limit of the Gigabit network was reached and all versions throughput deteriorated as they were saturated. Saturation happens earlier using more threads because contention in shared resources have a greater impact. The Boa server uses a mechanism that minimizes the effect of saturation limiting the number of active connections. We disabled this mechanism on purpose so we could find the point at which each parallel version saturates. 8

9 replies/s Number of clients/s original boa boa-dsections with 2 threads boa-dsections with 4 threads boa-omp with 2 threads boa-omp with 4 threads boa-pthreads with 2 threads boa-pthreads with 4 threads Fig. 6. Throughput ( replies per second ) Figure 7 shows the average response time for each load of clients and each Boa version. All parallel versions achieved a lower response time than the original Boa version. With a load of 800 clients per second response time was reduced as much as by three. Doubling the number of threads in boa-dsections and boa-pthreads was useful reducing the response time. With four threads boadsections and boa-pthreads behaved so closely that their lines are overlapped. With two threads they behaved similarly except with a load of 700 clients per second where boa-pthreads response time was lower. The average response time for the boa-omp version was very low, near to zero. This result is misleading because the server was rejecting more than 75% of the requests. 6 Comparison In this section we compare the programming effort required by the three parallel versions: pthreads, OpenMP and dynamic sections. One of the most consuming tasks in parallelizing all the versions was removing the static variables in local functions. Due to the large amount of static variables that may appear in serial C codes compilers could provide an option that transformed any variable with static storage to a variable with thread private storage. This option would reduce time spent in parallelizing large C codes. Another effort, common in all versions, was protecting shared data with critical sections and locks. This work in was quicker done with OpenMP than with pthreads as you only need to add the appropriate directive instead of having to declare a mutex variable and using lock and unlock calls. Nevertheless, when you need a critical section for each element of a structure (e.g. the open 9

10 ms Number of clients/s original boa boa-dsections with 2 threads boa-dsections with 4 threads boa-omp with 2 threads boa-omp with 4 threads boa-pthreads with 2 threads boa-pthreads with 4 threads Fig. 7. Response time ( ms. ) files cache) for improving performance, it is as difficult in OpenMP as it is in pthreads because you need to use omp_locks. We think a possible way to solve this problem would be allowing dynamically named critical sections (e.g. cache_lock[i] ). With these kind of critical sections the same code protecting a single entry of an complex structure would work for all the entries while each one will still have its own lock. The pthread version required several modifications to the original source code. Code for the consumer threads was developed. And, although it was not very difficult because there was only one type of task to consume (i.e. requests) it required some expertise to handle access to the queue correctly and efficiently. Some other changes were required to the code of the producer thread including: creating the consumers, initializing the locks, and removing the request processing loop. Also, mutex locks were added to protect the access to the ready and blocked queues. The overall effort was moderate. The OpenMP version did not require as many changes in the source code compare as the pthread version. But, it needed a great deal of attention to maintain the correctness of the single workshare (i.e. that all threads executed all the iterations). This restriction also caused the reduction in performance. In the other versions, the threads could execute a task as soon as it was ready, or even run in parallel request processing and accepting new connections. In the OpenMP version all threads must wait until all the requests, that were available when the traversal started, are processed even if there are new requests to process. This suggests that current workshares are not well suited for handling irregular parallelism. 10

11 But both the pthread version and the OpenMP version were easier to develop because the original version had integrated a complex code that enabled request concurrency. Otherwise, the programming effort would have been greater. On the other hand, the dynamic sections version was simpler as the programmer did not need to code the management of the ready tasks. Even from a simpler version of the serial version the programming effort with dynamic sections would have been minor. Dynamic sections also allow to easily mix different kinds of parallel tasks. While the pthreads code would became more and more complex if it had to deal with different kinds of parallel tasks the dynamic section complexity would remain constant. 7 Conclusions and Future work In this paper, we have explored the use of OpenMP to parallelize a web server. We have shown how, adding a few directives, the request processing loop of the web server can be parallelized. But this simple version did not perform efficiently. We used the proposed dynamic sections to implement a simpler parallel web server (i.e. without application level task management ). Evaluation showed that this version had a performance, in throughput and average response time, as good as the performance obtained by the server developed with pthreads. But in the pthread version the programmer needed to develop a specific task management for the application whereas the dynamic sections version simplified the programming. In the future, we will apply OpenMP to other web scenarios where studies have pointed out that there can improvements in throughput by using multiple processors: SSL enabled applications[12] and dynamic content applications [11]. Acknowledgements Authors will like to thank Vincenç Beltran, David Carrera and the edragon team [18] for their valuable help and for allowing us to use their resources. This research has been supported by the Ministry of Science and Technology of Spain under contract TIN C References 1. OpenMP Organization. Openmp fortran application interface, v June Larry Doolittle and Jon Nelson. Boa webserver site Jairo Balart, Alejandro Duran, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, and Jesús Labarta. Nanos mercurium: a research compiler for openmp. In Proceedings of the European Workshop on OpenMP 2004, October

12 4. Bob Kuhn, Paul Petersen, and Eamonn O Toole. Openmp versus threading in c/c++. Concurrency - Practice and Experience, 12: , D. J. Lee and T. J. Downar. The application of posix threads and openmp to the u.s. nrc neutron kinetics code parcs. In R. Eigenmann and M.J. Voss, editors, Proceedings of the International Workshop on OpenMP Applications and Tools 2001, volume 2104 of Lecture Notes in Computer Science, pages 69 83, July C. Breshears and P. Luong. Comparison of openmp and pthreads within a coastal ocean circulation model code. In Workshop on OpenMP Applications and Tools, July Eugen Dedu, Stephane Vialle, and Claude Timsit. Comparison of openmp and classical multi-threading parallelization for regular and irregular algorithms. In Software Engineering Applied to Networking and Parallel/Distributed Computing (SNPD), pages 53 60, R. Asenjo, F. Corbera, E. Gutiérrez, M.A. Navarro, O. Plata, and E.L. Zapata. Optimization techniques for irregular and pointer-based programs. In Proceedings of the 12th EuroMicro Conference on Parallel, Distributed and Network-Based Processing (PDP 04), pages 11 13, February S. Shah, G. Haab, P. Petersen, and J. Throop. Flexible control structures for parallellism in openmp. In 1st European Workshop on OpenMP, September Takashi Ishihara, Aaron W. Keen, Justin T. Maris, Eric Wohlstadter, and Ronald A. Olsson. Cow: A cooperative multithreading web server. In The 2002 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 02), pages , June J. Jeong, S. Park, and J. Nang. Performance analysis of a multithreaded web server on multiprocessor. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000), pages , June J. Guitart, V. Beltran, D. Carrera, J. Torres, and E. Ayguadé. Characterizing secure dynamic web applications scalability. In 19th International Parallel and Distributed Symposium (IPDPS 05), April Apache web server M. Welsh, D. Culler, and E. Brewer. Seda: An architecture for well-conditioned, scalable internet services. In Symposium on Operating Systems Principles (SOSP 01), pages , October V.S. Pai, P. Druschel, and W. Zwaenepoel. Flash: An efficient and portable web server, D. Mosberger and T. Jin. httperf a tool for measuring web server performance. In First Workshop on Internet Server Performance, pages 59 67, June P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurament and Modeling of Computer Systems, pages , edragon Research Group site. 12

A Hybrid Web Server Architecture for e-commerce Applications

A Hybrid Web Server Architecture for e-commerce Applications A Hybrid Web Server Architecture for e-commerce Applications David Carrera, Vicenç Beltran, Jordi Torres and Eduard Ayguadé {dcarrera, vbeltran, torres, eduard}@ac.upc.es European Center for Parallelism

More information

A Hybrid Web Server Architecture for Secure e-business Web Applications

A Hybrid Web Server Architecture for Secure e-business Web Applications A Hybrid Web Server Architecture for Secure e-business Web Applications Vicenç Beltran, David Carrera, Jordi Guitart, Jordi Torres, and Eduard Ayguadé European Center for Parallelism of Barcelona (CEPBA),

More information

Development and Evaluation of an Experimental Javabased

Development and Evaluation of an Experimental Javabased Development and Evaluation of an Experimental Javabased Web Server Syed Mutahar Aaqib Department of Computer Science & IT University of Jammu Jammu, India Lalitsen Sharma, PhD. Department of Computer Science

More information

A Hybrid Web Server Architecture for Secure e-business Web Applications

A Hybrid Web Server Architecture for Secure e-business Web Applications A Web Server Architecture for Secure e-business Web Applications Vicenç Beltran, David Carrera, Jordi Guitart, Jordi Torres and Eduard Ayguadé {vbeltran, dcarrera, jguitart, torres, eduard}@ac.upc.es European

More information

Web Server Architectures

Web Server Architectures Web Server Architectures CS 4244: Internet Programming Dr. Eli Tilevich Based on Flash: An Efficient and Portable Web Server, Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel, 1999 Annual Usenix Technical

More information

Tuning Dynamic Web Applications using Fine-Grain Analysis

Tuning Dynamic Web Applications using Fine-Grain Analysis Tuning Dynamic Web Applications using Fine-Grain Analysis Jordi Guitart, David Carrera, Jordi Torres, Eduard Ayguadé and Jesús Labarta European Center for Parallelism of Barcelona (CEPBA) Computer Architecture

More information

A Comparison of Task Pool Variants in OpenMP and a Proposal for a Solution to the Busy Waiting Problem

A Comparison of Task Pool Variants in OpenMP and a Proposal for a Solution to the Busy Waiting Problem A Comparison of Task Pool Variants in OpenMP and a Proposal for a Solution to the Busy Waiting Problem Alexander Wirz, Michael Süß, and Claudia Leopold University of Kassel, Research Group Programming

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com info@veritest.com Microsoft Windows

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers

Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Syed Mutahar Aaqib Research Scholar Department of Computer Science & IT IT University of Jammu Lalitsen Sharma Associate Professor

More information

Autonomic resource management for the Xen Hypervisor

Autonomic resource management for the Xen Hypervisor Autonomic resource management for the Xen Hypervisor Íñigo Goiri and Jordi Guitart Universitat Politécnica de Catalunya Barcelona, Spain {igoiri,jguitart}@ac.upc.es Abstract Servers workload varies during

More information

Evaluation of OpenMP Task Scheduling Strategies

Evaluation of OpenMP Task Scheduling Strategies Evaluation of OpenMP Task Scheduling Strategies Alejandro Duran, Julita Corbalán, and Eduard Ayguadé Barcelona Supercomputing Center Departament d Arquitectura de Computadors Universitat Politècnica de

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Course Development of Programming for General-Purpose Multicore Processors

Course Development of Programming for General-Purpose Multicore Processors Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu

More information

SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS

SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS Benjamin Houdeshell IS809 5/14/2014 Background/Motivation Performance/load testing research concerned with the simulation of users behavior

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

Understanding Tuning Complexity in Multithreaded and Hybrid Web Servers

Understanding Tuning Complexity in Multithreaded and Hybrid Web Servers Understanding Tuning Complexity in Multithreaded and Hybrid Web Servers Vicenç Beltran, Jordi Torres and Eduard Ayguadé {vbeltran, jordi.torres, eduard.ayguade}@bsc.es Barcelona Supercomputing Center (BSC)

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Complete Instrumentation Requirements for Performance Analysis of Web Based Technologies

Complete Instrumentation Requirements for Performance Analysis of Web Based Technologies Complete Instrumentation Requirements for Performance Analysis of Web Based Technologies David Carrera Jordi Guitart Jordi Torres Eduard Ayguadé Jesús Labarta European Center for Parallelism of Barcelona

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Comparing the Performance of Web Server Architectures

Comparing the Performance of Web Server Architectures Comparing the Performance of Web Server Architectures David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario,

More information

SYSTEM ecos Embedded Configurable Operating System

SYSTEM ecos Embedded Configurable Operating System BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original

More information

Ready Time Observations

Ready Time Observations VMWARE PERFORMANCE STUDY VMware ESX Server 3 Ready Time Observations VMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended

More information

One Server Per City: C Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu

One Server Per City: C Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu One Server Per City: C Using TCP for Very Large SIP Servers Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu Goal Answer the following question: How does using TCP affect the scalability and

More information

Performance Evaluation and Optimization of A Custom Native Linux Threads Library

Performance Evaluation and Optimization of A Custom Native Linux Threads Library Center for Embedded Computer Systems University of California, Irvine Performance Evaluation and Optimization of A Custom Native Linux Threads Library Guantao Liu and Rainer Dömer Technical Report CECS-12-11

More information

Comparison of Web Server Architectures: a Measurement Study

Comparison of Web Server Architectures: a Measurement Study Comparison of Web Server Architectures: a Measurement Study Enrico Gregori, IIT-CNR, enrico.gregori@iit.cnr.it Joint work with Marina Buzzi, Marco Conti and Davide Pagnin Workshop Qualità del Servizio

More information

TheImpactofWeightsonthe Performance of Server Load Balancing(SLB) Systems

TheImpactofWeightsonthe Performance of Server Load Balancing(SLB) Systems TheImpactofWeightsonthe Performance of Server Load Balancing(SLB) Systems Jörg Jung University of Potsdam Institute for Computer Science Operating Systems and Distributed Systems March 2013 1 Outline 1

More information

Load Balancing MPI Algorithm for High Throughput Applications

Load Balancing MPI Algorithm for High Throughput Applications Load Balancing MPI Algorithm for High Throughput Applications Igor Grudenić, Stjepan Groš, Nikola Bogunović Faculty of Electrical Engineering and, University of Zagreb Unska 3, 10000 Zagreb, Croatia {igor.grudenic,

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

An Instrumentation Tool for Threaded Java Application Servers

An Instrumentation Tool for Threaded Java Application Servers XIII JORNADAS DE PARALELISMO LLEIDA, SEPTIEMBRE 2002 1 An Instrumentation Tool for Threaded Java Application Servers David Carrera, Jordi Guitart, Jordi Torres, Eduard Ayguadé and Jesús Labarta Abstract

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Scalability evaluation of barrier algorithms for OpenMP

Scalability evaluation of barrier algorithms for OpenMP Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Testing Database Performance with HelperCore on Multi-Core Processors

Testing Database Performance with HelperCore on Multi-Core Processors Project Report on Testing Database Performance with HelperCore on Multi-Core Processors Submitted by Mayuresh P. Kunjir M.E. (CSA) Mahesh R. Bale M.E. (CSA) Under Guidance of Dr. T. Matthew Jacob Problem

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program

More information

Performance Evaluation Approach for Multi-Tier Cloud Applications

Performance Evaluation Approach for Multi-Tier Cloud Applications Journal of Software Engineering and Applications, 2013, 6, 74-83 http://dx.doi.org/10.4236/jsea.2013.62012 Published Online February 2013 (http://www.scirp.org/journal/jsea) Performance Evaluation Approach

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

Load Balancing on an Interactive Multiplayer Game Server

Load Balancing on an Interactive Multiplayer Game Server Load Balancing on an Interactive Multiplayer Game Server Daniel Cordeiro 1,, Alfredo Goldman 1, and Dilma da Silva 2 1 Department of Computer Science, University of São Paulo danielc,gold@ime.usp.br 2

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

IN the last few decades, OpenMP has emerged as the de

IN the last few decades, OpenMP has emerged as the de 404 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 3, MARCH 2009 The Design of OpenMP Tasks Eduard Ayguadé, Nawal Copty, Member, IEEE Computer Society, Alejandro Duran, Jay Hoeflinger,

More information

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting

More information

Dynamic Load Balancing of MPI+OpenMP applications

Dynamic Load Balancing of MPI+OpenMP applications Dynamic Load Balancing of MPI+OpenMP applications Julita Corbalán, Alejandro Duran, Jesús Labarta CEPBA-IBM Research Institute Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya

More information

Efficient Data Management Support for Virtualized Service Providers

Efficient Data Management Support for Virtualized Service Providers Efficient Data Management Support for Virtualized Service Providers Íñigo Goiri, Ferran Julià and Jordi Guitart Barcelona Supercomputing Center - Technical University of Catalonia Jordi Girona 31, 834

More information

A POSIX-Ada Interface for Application-Defined Scheduling

A POSIX-Ada Interface for Application-Defined Scheduling A POSIX-Ada Interface for Application-Defined Scheduling By: Mario Aldea Rivas Michael González Harbour (aldeam@unican.es) (mgh@unican.es) Ada-Europe 2002 Vienna, Austria, June 17-21, 2002 4 GRUPO DE COMPUTADORES

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

A Statistically Customisable Web Benchmarking Tool

A Statistically Customisable Web Benchmarking Tool Electronic Notes in Theoretical Computer Science 232 (29) 89 99 www.elsevier.com/locate/entcs A Statistically Customisable Web Benchmarking Tool Katja Gilly a,, Carlos Quesada-Granja a,2, Salvador Alcaraz

More information

Experimental Evaluation of Horizontal and Vertical Scalability of Cluster-Based Application Servers for Transactional Workloads

Experimental Evaluation of Horizontal and Vertical Scalability of Cluster-Based Application Servers for Transactional Workloads 8th WSEAS International Conference on APPLIED INFORMATICS AND MUNICATIONS (AIC 8) Rhodes, Greece, August 2-22, 28 Experimental Evaluation of Horizontal and Vertical Scalability of Cluster-Based Application

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Real Time Programming: Concepts

Real Time Programming: Concepts Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize

More information

First-class User Level Threads

First-class User Level Threads First-class User Level Threads based on paper: First-Class User Level Threads by Marsh, Scott, LeBlanc, and Markatos research paper, not merely an implementation report User-level Threads Threads managed

More information

Optimization of Cluster Web Server Scheduling from Site Access Statistics

Optimization of Cluster Web Server Scheduling from Site Access Statistics Optimization of Cluster Web Server Scheduling from Site Access Statistics Nartpong Ampornaramveth, Surasak Sanguanpong Faculty of Computer Engineering, Kasetsart University, Bangkhen Bangkok, Thailand

More information

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen

More information

How to Parallelize Web Service Based Applications With OpenMP

How to Parallelize Web Service Based Applications With OpenMP Web Service Call Parallelization Using OpenMP Sébastien Salva 1,Clément Delamare 2,andCédric Bastoul 3 1 LIMOS, Université d Auvergne Clermont 1 Campus des Cézeaux F-63173 Aubière, France salva@iut.u-clermont1.fr

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Performance Evaluation of Shared Hosting Security Methods

Performance Evaluation of Shared Hosting Security Methods Performance Evaluation of Shared Hosting Security Methods Seyed Ali Mirheidari, Sajjad Arshad, Saeidreza Khoshkdahan Computer Engineering Department, Sharif University of Technology, International Campus,

More information

Why Threads Are A Bad Idea (for most purposes)

Why Threads Are A Bad Idea (for most purposes) Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).

More information

Web Server Software Architectures

Web Server Software Architectures Web Server Software Architectures Author: Daniel A. Menascé Presenter: Noshaba Bakht Web Site performance and scalability 1.workload characteristics. 2.security mechanisms. 3. Web cluster architectures.

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

SIDN Server Measurements

SIDN Server Measurements SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

A Locality Approach to Task-scheduling in OpenMP for Tiled Multicore Architectures

A Locality Approach to Task-scheduling in OpenMP for Tiled Multicore Architectures A Locality Approach to Task-scheduling in OpenMP for Tiled Multicore Architectures No Author Given No Institute Given Abstract. Multicore and other parallel computer systems increasingly expose architectural

More information

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12 XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5

More information

A Locality Approach to Architecture-aware Task-scheduling in OpenMP

A Locality Approach to Architecture-aware Task-scheduling in OpenMP A Locality Approach to Architecture-aware Task-scheduling in OpenMP Ananya Muddukrishna ananya@kth.se Mats Brorsson matsbror@kth.se Vladimir Vlassov vladv@kth.se ABSTRACT Multicore and other parallel computer

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems. ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Eloquence Training What s new in Eloquence B.08.00

Eloquence Training What s new in Eloquence B.08.00 Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium

More information

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures Gabriele Jost *, Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 USA {gjost,hjin}@nas.nasa.gov

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Performance Analysis of Web based Applications on Single and Multi Core Servers

Performance Analysis of Web based Applications on Single and Multi Core Servers Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Policy-based autonomous bidding for overload management in ecommerce websites

Policy-based autonomous bidding for overload management in ecommerce websites EXTENDED ABSTRACTS Policy-based autonomous bidding for overload management in ecommerce websites TONI MORENO 1,2, NICOLAS POGGI 3, JOSEP LLUIS BERRAL 3, RICARD GAVALDÀ 4, JORDI TORRES 2,3 1 Department

More information