Improving the performance of data servers on multicore architectures. Fabien Gaud

Size: px
Start display at page:

Download "Improving the performance of data servers on multicore architectures. Fabien Gaud"

Transcription

1 Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, / 50

2 Processor evolution Before 2006: One core Regular increase of clock frequency Since then: Almost no increase of clock frequency Increasing number of cores: Multicore architectures NUMA architectures Manycore architectures 2 / 50

3 Multicore is a hot topic Legacy applications do not efficiently leverage multicore hardware Research topics: Programming models/languages Operating systems abstractions/internals Runtime/libraries Applications Active research field: Corey (OSDI 08) Barrelfish (SOSP 09), Helios (SOSP 09) PK (OSDI 10) 3 / 50

4 This thesis Application domain: data servers, a.k.a. networked services Goal: Improve the performance of data servers on multicore architectures Contributions: Efficient multicore event-driven programming Scaling the Apache Web server on NUMA multicore systems 4 / 50

5 #1: Efficient multicore event-driven programming CFSE 2009 (best paper award) ICDCS / 50

6 Event-driven programming Application is structured as a set of handlers processing events An event can be: Triggered by an I/O operation Produced internally by the application Events are stored in a queue and processed by a single thread Handler 1 Control Loop Handler 2 Event Handler 3 Handler 4 6 / 50

7 Multicore event-driven programming Goal: concurrently execute multiple handlers Challenges: Concurrency management Balancing load on cores Solutions: N-Copy 1-Copy with synchronization 7 / 50

8 N-Copy Principle: running one instance of the application per core App1 App2 App3 App4 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Event 8 / 50

9 N-Copy (2) Advantages: No concurrency management needed No application modification needed Drawbacks: Not applicable to all applications Multiple copies of data Requires external load balancing 9 / 50

10 1-copy with synchronization Principle: 1 instance on multiple cores Concurrency can be managed using: Locks STM Annotations Load balancing can be achieved with: Static placement Workgiving Workstealing Chosen approach is implemented in Libasync-SMP (Usenix 03) 10 / 50

11 Libasync-SMP Concurrency management Annotations (colors) set on events App1 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Events with color 0 Events with color 1 Events with color 2 Events with color 3 11 / 50

12 Libasync-SMP Load balancing Load balancing is done through workstealing App1 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Events with color 0 Events with color 1 Events with color 2 Events with color 3 12 / 50

13 1-Copy with synchronization Advantages: Allows sharing between cores Allows load balancing between cores Drawbacks: Need to modify the application Efficient load balancing is difficult 13 / 50

14 Workstealing performance: SFS Libasync-smp Libasync-smp - WS Throughput (MB/sec) % throughput increase 14 / 50

15 Workstealing performance: Web server Libasync-smp Libasync-smp - WS Throughput (MB/sec) % throughput decrease 15 / 50

16 What is the problem? Fine grain events: Stealing time (197 Kcycles) stolen processing time (20 Kcycles) Inefficient cache usage: +146% L2 cache misses Inefficient workstealing implementation O(n) complexity 16 / 50

17 Contributions New: Workstealing algorithm Runtime implementation Fine grain events: Algorithm: steal events with high execution time Inefficient cache usage: Algorithm: steal cache-friendly events Algorithm: take cache hierarchy into account Inefficient workstealing implementation Runtime: mitigate stealing costs 17 / 50

18 Idea #1: Take into account execution time Problem: stealing cost is not always amortized Many event handlers are relatively fine grain Workstealing may have a significant cost Solution: Time-left stealing Know at any time which colors are worthy (Handler execution time is set by the programmer) 18 / 50

19 Idea #2: Take into account caches Problem: Workstealing can reduce cache efficiency Stealing events increases cache misses Example: event handlers accessing large, long-lived, data sets Solution 1: Penalty-aware stealing Set penalties on handlers based on their cache access pattern (Penalties are set manually based on preliminary profiling) Solution 2: Locality-aware stealing Give priority to a neighbor when stealing 19 / 50

20 Runtime implementation Core X color-queue Control loop Color 0 Color 1 Color 2 Color 3 core-queue stealing-queue One color-queue per color One core-queue per core that links color-queues One stealing-queue per core 20 / 50

21 Performance evaluation: SFS Libasync-smp Libasync-smp - WS Mely - WS Throughput (MB/sec) No throughput degradation 21 / 50

22 Performance evaluation: Web server Libasync-smp Libasync-smp - WS Mely - WS Throughput (MB/sec) % throughput improvement 22 / 50

23 Web server profiling Web server configuration Stealing time Stolen time Cache misses/event Libasync-SMP - WS 197 Kcycles 20 Kcycles 21 Mely - WS 6 Kcycles 23 Kcycles 9 Stealing time (6 Kcycles) < stolen processing time (23 Kcycles) Improved cache efficiency: -57% L2 cache misses 23 / 50

24 Summary Goal: efficient runtime for multicore event-driven systems Problem: workstealing sometimes degrades performance Contributions: New workstealing algorithm New runtime implementation Results: improve throughput by up to 73% 24 / 50

25 #2: Scaling the Apache Web server on NUMA multicore systems Under submission 25 / 50

26 Problem % # of clients per die Ideal scalability Apache # of dies The Apache web server do not scale on NUMA architectures 26 / 50

27 What can we do? Address scalability issues at the OS level Corey (OSDI 08) Barrelfish (SOSP 09) PK (OSDI 10) 27 / 50

28 Apache on PK % # of clients per die Ideal scalability Apache on PK Apache # of dies Does not solve scalability issues 28 / 50

29 What do we propose? Addressing scalability issues at the OS level is not sufficient Application-level issues Some issues are difficult to handle (e.g. scheduling) Approach: address scalability issues at the application level 29 / 50

30 Methodology Consider both hardware and software bottlenecks Hardware bottlenecks: Processor interconnect Distant memory accesses Software bottlenecks: Synchronization primitives 30 / 50

31 Hardware testbed 61 Gb/s C0 C4 C8 C12 C3 C7 C11 C15 I/O L1 L1 L1 L1 24 Gb/s L2 L2 L2 L2 L3 Die 0 Die 1 Die 2 C2 C6 C10 C14 Die 3 C1 C5 C9 C13 I/O 24 Gb/s 4 processors / 16 cores 31 / 50

32 Hardware testbed 61 Gb/s C0 C4 C8 C12 C3 C7 C11 C15 I/O L1 L1 L1 L1 24 Gb/s L2 L2 L2 L2 L3 Die 0 Die 1 Die 2 C2 C6 C10 C14 Die 3 C1 C5 C9 C13 I/O 24 Gb/s 4 processors / 16 cores 31 / 50

33 Hardware bottlenecks Memory efficiency (IPC) Configuration Average IPC 1 die dies % IPC decrease 32 / 50

34 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency Configuration L3 cache miss ratio (%) 1 die 14 4 dies / 50

35 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency HyperTransport link saturation Configuration Max HT usage (%) 1 die 25 4 dies / 50

36 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency HyperTransport link saturation Increased number of distant memory accesses Configuration Distant accesses/kb 1 die 4 4 dies / 50

37 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC Request Receiving a TCP request 34 / 50

38 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC HTTP request processing 34 / 50

39 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC PHP processing 34 / 50

40 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC Sending the response (1) 34 / 50

41 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 Reply C10 NIC Sending the response (2) 34 / 50

42 Proposal #1 Solution: co-localizing TCP, Apache and PHP processing Implementation: use one instance of the Apache/PHP stack per die (N-Copy) One node manages 5 network interfaces 35 / 50

43 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 NIC Request Receiving a TCP request 36 / 50

44 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC HTTP request processing 36 / 50

45 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC PHP processing 36 / 50

46 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC Sending the response (1) 36 / 50

47 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 Reply C2 C6 C10 NIC Sending the response (2) 36 / 50

48 N-Copy: performance % # of clients per die Ideal scalability Apache NCopy Apache # of dies 9.1% performance improvement compared to stock Apache 37 / 50

49 N-Copy: performance (2) Configuration Average IPC Distant accesses/kb 1 die dies (Stock Apache) dies (N-Copy) Memory efficiency improved by 20% 38 / 50

50 N-Copy: can we do better? Die Average CPU usage Die Die 1 85 Die 2 85 Die Problem: Dies are not equally efficient Load is not properly balanced on dies 39 / 50

51 N-Copy: load balancing Solution: balance load on dies proportionally to their efficiency Implementation: use an external load balancing mechanism Currently implemented at client-side Could be integrated in a more global solution 40 / 50

52 N-Copy: final performance % # of clients per die Ideal scalability Apache NCopy LB Apache NCopy Apache # of dies 21.2% performance improvement compared to stock Apache 41 / 50

53 Software bottlenecks Goal: find functions that Do not scale Represent a significant execution time Example: Function f accounts for 1 cycle/byte at 1 die 10 cycles/byte at 4 dies 20% of the total execution time 18% potential performance gain 42 / 50

54 Software bottlenecks (2) Function Potential performance gain (%) d lookup 2.49% atomic dec and lock 2.32% lookup mnt 1.41% copy user generic string 0.83% memcpy 0.76% Problem: the VFS layer does not scale Aggregated potential performance gain: 6 % Most of the calls are issued by the stat function 43 / 50

55 Proposal #2 Solution: use an application-level cache to reduce the number of calls to stat Implementation: Modified the Apache ap directory walk function Using inotify for file updates 44 / 50

56 Stat cache: performance % # of clients per die Ideal scalability 2000 Apache with all optims Apache NCopy LB Apache NCopy Apache # of dies 33% performance improvement compared to stock Apache 45 / 50

57 Summary Problem: Apache does not scale on NUMA architectures Contribution: application-level optimizations considering NUMA aspects and Linux scalability issues Results: +33% performance improvement 46 / 50

58 Conclusion 47 / 50

59 Conclusion Application domain: data servers Goal: Improve the performance of data servers on multicore architectures Contributions: Efficient multicore event-driven programming Scaling the Apache Web server on NUMA multicore systems 48 / 50

60 Future work Short term: Workstealing: automate profiling and decisions Apache: study other workloads Long term: Study the impact of distant memory accesses on other servers Study the impact of programming models on multicore performance Study the scalability of the Java virtual machine 49 / 50

61 Questions? 50 / 50

62 Backup Slides 2 / 50

63 Web server Returns static page content (1KB files requested) Closed-loop injection 5 load injectors simulating between 200 and 2000 clients Architecture is based on legacy design Per-connection coloring RegisterFd InEpoll Close Dec Accepted Clients Accept Read Request Parse Request GetFrom Cache Write Response Epoll 3 / 50

64 Web server evaluation Throughput (KRequests/s) Mely - WS Libasync-smp Mely Libasync-smp - WS Number of Clients Up to 73% improvement over the Libasync-SMP workstealing mechanism 4 / 50

65 Mely - Other web server evaluation (2) 200 Mely - WS Userver Apache Throughput (KRequests/s) Number of Clients Performance better than other real world Web servers 5 / 50

66 Apache Workload description SPECWeb2005 Support benchmark Vendor site Mostly static / PHP for dynamic pages Back-end Simulator (BeSim) Closed-loop injection with think times Defined QoS: 99% of clients served within 5s 95% of clients served within 3s Throughput constraints Modified to fit in main memory: 12GB 6 / 50

67 Software configuration Apache Worker version using both threads and processes Sendfile enabled to improve performance PHP Tuned number of PHP processes With eaccelerator Linux NUMA support IRQ processing manually balanced Responsible for dispatching thread and processes 7 / 50

R. Hashemian 1, D. Krishnamurthy 1, M. Arlitt 2, N. Carlsson 3

R. Hashemian 1, D. Krishnamurthy 1, M. Arlitt 2, N. Carlsson 3 R. Hashemian 1, D. Krishnamurthy 1, M. Arlitt 2, N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linköping University by: Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

Datacenter Operating Systems

Datacenter Operating Systems Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major

More information

Improving the Scalability of a Multi-core Web Server

Improving the Scalability of a Multi-core Web Server Improving the Scalability of a Multi-core Web Server Raoufehsadat Hashemian ECE Department, University of Calgary, Canada rhashem@ucalgary.ca ABSTRACT Martin Arlitt Sustainable Ecosystems Research Group

More information

Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study

Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study Fabien Gaud INRIA abien.gaud@inria.r Gilles Muller INRIA gilles.muller@inria.r Renaud Lachaize Grenoble University

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

Application Performance Testing Basics

Application Performance Testing Basics Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Web Server Architectures

Web Server Architectures Web Server Architectures CS 4244: Internet Programming Dr. Eli Tilevich Based on Flash: An Efficient and Portable Web Server, Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel, 1999 Annual Usenix Technical

More information

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures Ada Gavrilovska Karsten Schwan, Mukil Kesavan Sanjay Kumar, Ripal Nathuji, Adit Ranadive Center for Experimental

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Unit 2 Distributed Systems R.Yamini Dept. Of CA, SRM University Kattankulathur

Unit 2 Distributed Systems R.Yamini Dept. Of CA, SRM University Kattankulathur Unit 2 Distributed Systems R.Yamini Dept. Of CA, SRM University Kattankulathur 1 Introduction to Distributed Systems Why do we develop distributed systems? availability of powerful yet cheap microprocessors

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet

More information

Evaluating and Comparing the Impact of Software Faults on Web Servers

Evaluating and Comparing the Impact of Software Faults on Web Servers Evaluating and Comparing the Impact of Software Faults on Web Servers April 2010, João Durães, Henrique Madeira CISUC, Department of Informatics Engineering University of Coimbra {naaliel, jduraes, henrique}@dei.uc.pt

More information

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Microkernels, virtualization, exokernels. Tutorial 1 CSC469 Microkernels, virtualization, exokernels Tutorial 1 CSC469 Monolithic kernel vs Microkernel Monolithic OS kernel Application VFS System call User mode What was the main idea? What were the problems? IPC,

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Agility Database Scalability Testing

Agility Database Scalability Testing Agility Database Scalability Testing V1.6 November 11, 2012 Prepared by on behalf of Table of Contents 1 Introduction... 4 1.1 Brief... 4 2 Scope... 5 3 Test Approach... 6 4 Test environment setup... 7

More information

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server Performance brief for IBM WebSphere Application Server.0 with VMware ESX.0 on HP ProLiant DL0 G server Table of contents Executive summary... WebSphere test configuration... Server information... WebSphere

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

Performance Tuning and Optimizing SQL Databases 2016

Performance Tuning and Optimizing SQL Databases 2016 Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Eloquence Training What s new in Eloquence B.08.00

Eloquence Training What s new in Eloquence B.08.00 Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium

More information

Tomcat Tuning. Mark Thomas April 2009

Tomcat Tuning. Mark Thomas April 2009 Tomcat Tuning Mark Thomas April 2009 Who am I? Apache Tomcat committer Resolved 1,500+ Tomcat bugs Apache Tomcat PMC member Member of the Apache Software Foundation Member of the ASF security committee

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Performance Management for Cloudbased STC 2012

Performance Management for Cloudbased STC 2012 Performance Management for Cloudbased Applications STC 2012 1 Agenda Context Problem Statement Cloud Architecture Need for Performance in Cloud Performance Challenges in Cloud Generic IaaS / PaaS / SaaS

More information

ZooKeeper. Table of contents

ZooKeeper. Table of contents by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...

More information

An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management

An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7 Introduction 1 Performance on Hosted Server 1 Figure 1: Real World Performance 1 Benchmarks 2 System configuration used for benchmarks 2 Figure 2a: New tickets per minute on E5440 processors 3 Figure 2b:

More information

Comparing the Performance of Web Server Architectures

Comparing the Performance of Web Server Architectures Comparing the Performance of Web Server Architectures David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario,

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

High-Density Network Flow Monitoring

High-Density Network Flow Monitoring Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible

More information

Dynamic Resource allocation in Cloud

Dynamic Resource allocation in Cloud Dynamic Resource allocation in Cloud ABSTRACT: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from

More information

PARALLELS CLOUD SERVER

PARALLELS CLOUD SERVER PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...

More information

Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers

Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Syed Mutahar Aaqib Research Scholar Department of Computer Science & IT IT University of Jammu Lalitsen Sharma Associate Professor

More information

Virtualization for Future Internet

Virtualization for Future Internet Virtualization for Future Internet 2010.02.23 Korea University Chuck Yoo (hxy@os.korea.ac.kr) Why Virtualization Internet today Pro and con Your wonderful research results Mostly with simulation Deployment

More information

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will

More information

Enterprise Applications

Enterprise Applications Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting

More information

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

BENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES

BENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES BENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES Aparna Venkatraman, Vinay Pandey, Beth Plale, and Shing-Shong Shei Computer Science Dept., Indiana University {apvenkat, vkpandey, plale, shei}

More information

Deep Dive: Maximizing EC2 & EBS Performance

Deep Dive: Maximizing EC2 & EBS Performance Deep Dive: Maximizing EC2 & EBS Performance Tom Maddox, Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved What we ll cover Amazon EBS overview Volumes Snapshots

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Practical Performance Understanding the Performance of Your Application

Practical Performance Understanding the Performance of Your Application Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

HRG Assessment: Stratus everrun Enterprise

HRG Assessment: Stratus everrun Enterprise HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at

More information

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell

More information

Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure

Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure Justin Venezia Senior Solution Architect Paul Pindell Senior Solution Architect Contents The Challenge 3 What is a hyper-converged

More information

Performance Analysis of Web based Applications on Single and Multi Core Servers

Performance Analysis of Web based Applications on Single and Multi Core Servers Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Web Application s Performance Testing

Web Application s Performance Testing Web Application s Performance Testing B. Election Reddy (07305054) Guided by N. L. Sarda April 13, 2008 1 Contents 1 Introduction 4 2 Objectives 4 3 Performance Indicators 5 4 Types of Performance Testing

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Cloud Optimize Your IT

Cloud Optimize Your IT Cloud Optimize Your IT Windows Server 2012 Michael Faden Partner Technology Advisor Microsoft Schweiz 1 Beyond Virtualization virtualization The power of many servers, the simplicity of one Every app,

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

Design Issues in a Bare PC Web Server

Design Issues in a Bare PC Web Server Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York

More information

Software and the Concurrency Revolution

Software and the Concurrency Revolution Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Design and Performance of a Web Server Accelerator

Design and Performance of a Web Server Accelerator Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias IBM Research T. J. Watson Research Center P. O. Box 74 Yorktown Heights, NY 598 Abstract

More information

Tempesta FW. Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com

Tempesta FW. Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com Tempesta FW Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com What Tempesta FW Is? FireWall: layer 3 (IP) layer 7 (HTTP) filter FrameWork: high performance and flexible platform to build intelligent

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Crystal Reports Server 2008

Crystal Reports Server 2008 Revision Date: July 2009 Crystal Reports Server 2008 Sizing Guide Overview Crystal Reports Server system sizing involves the process of determining how many resources are required to support a given workload.

More information

The Association of System Performance Professionals

The Association of System Performance Professionals The Association of System Performance Professionals The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the measurement

More information

Recommendations for Performance Benchmarking

Recommendations for Performance Benchmarking Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best

More information

Tyche: An efficient Ethernet-based protocol for converged networked storage

Tyche: An efficient Ethernet-based protocol for converged networked storage Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June

More information

Implementing Probes for J2EE Cluster Monitoring

Implementing Probes for J2EE Cluster Monitoring Implementing s for J2EE Cluster Monitoring Emmanuel Cecchet, Hazem Elmeleegy, Oussama Layaida, Vivien Quéma LSR-IMAG Laboratory (CNRS, INPG, UJF) - INRIA INRIA Rhône-Alpes, 655 av. de l Europe, 38334 Saint-Ismier

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Performance Modeling for Web based J2EE and.net Applications

Performance Modeling for Web based J2EE and.net Applications Performance Modeling for Web based J2EE and.net Applications Shankar Kambhampaty, and Venkata Srinivas Modali Abstract When architecting an application, key nonfunctional requirements such as performance,

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

Technical Paper. Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage

Technical Paper. Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage Technical Paper Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage Release Information Content Version: 1.0 May 2014. Trademarks and Patents SAS Institute Inc., SAS Campus

More information

CON9577 Performance Optimizations for Cloud Infrastructure as a Service

CON9577 Performance Optimizations for Cloud Infrastructure as a Service CON9577 Performance Optimizations for Cloud Infrastructure as a Service John Falkenthal, Software Development Sr. Director - Oracle VM SPARC and VirtualBox Jeff Savit, Senior Principal Technical Product

More information

Computing at the HL-LHC

Computing at the HL-LHC Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,

More information

... Foreword... 17. ... Acknowledgments... 19. ... Introduction... 21

... Foreword... 17. ... Acknowledgments... 19. ... Introduction... 21 ... Foreword... 17... Acknowledgments... 19... Introduction... 21 1... Performance Management of an SAP Solution... 33 1.1... SAP Solution Architecture... 34 1.1.1... SAP Solutions and SAP Components...

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Migration Scenario: Migrating Batch Processes to the AWS Cloud Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog

More information

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland ory nel Storage ( M C S ) Demystified Jerome McFarland Principal Product Marketer AGENDA + INTRO AND ARCHITECTURE + PRODUCT DETAILS + APPLICATIONS THE COMPUTE-STORAGE DISCONNECT + Compute And Data Have

More information

CIT 668: System Architecture. Performance Testing

CIT 668: System Architecture. Performance Testing CIT 668: System Architecture Performance Testing Topics 1. What is performance testing? 2. Performance-testing activities 3. UNIX monitoring tools What is performance testing? Performance testing is a

More information

Agenda. Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat

Agenda. Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat Agenda Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat Tomcat Performance Tuning Tomcat Versions Application/System

More information

SQL Server 2008 Performance and Scale

SQL Server 2008 Performance and Scale SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement

More information

Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital

Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.

More information

Tivoli IBM Tivoli Web Response Monitor and IBM Tivoli Web Segment Analyzer

Tivoli IBM Tivoli Web Response Monitor and IBM Tivoli Web Segment Analyzer Tivoli IBM Tivoli Web Response Monitor and IBM Tivoli Web Segment Analyzer Version 2.0.0 Notes for Fixpack 1.2.0-TIV-W3_Analyzer-IF0003 Tivoli IBM Tivoli Web Response Monitor and IBM Tivoli Web Segment

More information