Improving the performance of data servers on multicore architectures. Fabien Gaud
|
|
- Brent Floyd
- 8 years ago
- Views:
Transcription
1 Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, / 50
2 Processor evolution Before 2006: One core Regular increase of clock frequency Since then: Almost no increase of clock frequency Increasing number of cores: Multicore architectures NUMA architectures Manycore architectures 2 / 50
3 Multicore is a hot topic Legacy applications do not efficiently leverage multicore hardware Research topics: Programming models/languages Operating systems abstractions/internals Runtime/libraries Applications Active research field: Corey (OSDI 08) Barrelfish (SOSP 09), Helios (SOSP 09) PK (OSDI 10) 3 / 50
4 This thesis Application domain: data servers, a.k.a. networked services Goal: Improve the performance of data servers on multicore architectures Contributions: Efficient multicore event-driven programming Scaling the Apache Web server on NUMA multicore systems 4 / 50
5 #1: Efficient multicore event-driven programming CFSE 2009 (best paper award) ICDCS / 50
6 Event-driven programming Application is structured as a set of handlers processing events An event can be: Triggered by an I/O operation Produced internally by the application Events are stored in a queue and processed by a single thread Handler 1 Control Loop Handler 2 Event Handler 3 Handler 4 6 / 50
7 Multicore event-driven programming Goal: concurrently execute multiple handlers Challenges: Concurrency management Balancing load on cores Solutions: N-Copy 1-Copy with synchronization 7 / 50
8 N-Copy Principle: running one instance of the application per core App1 App2 App3 App4 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Event 8 / 50
9 N-Copy (2) Advantages: No concurrency management needed No application modification needed Drawbacks: Not applicable to all applications Multiple copies of data Requires external load balancing 9 / 50
10 1-copy with synchronization Principle: 1 instance on multiple cores Concurrency can be managed using: Locks STM Annotations Load balancing can be achieved with: Static placement Workgiving Workstealing Chosen approach is implemented in Libasync-SMP (Usenix 03) 10 / 50
11 Libasync-SMP Concurrency management Annotations (colors) set on events App1 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Events with color 0 Events with color 1 Events with color 2 Events with color 3 11 / 50
12 Libasync-SMP Load balancing Load balancing is done through workstealing App1 Core 1 Core 2 Core 3 Core 4 Control loop Event queue Events with color 0 Events with color 1 Events with color 2 Events with color 3 12 / 50
13 1-Copy with synchronization Advantages: Allows sharing between cores Allows load balancing between cores Drawbacks: Need to modify the application Efficient load balancing is difficult 13 / 50
14 Workstealing performance: SFS Libasync-smp Libasync-smp - WS Throughput (MB/sec) % throughput increase 14 / 50
15 Workstealing performance: Web server Libasync-smp Libasync-smp - WS Throughput (MB/sec) % throughput decrease 15 / 50
16 What is the problem? Fine grain events: Stealing time (197 Kcycles) stolen processing time (20 Kcycles) Inefficient cache usage: +146% L2 cache misses Inefficient workstealing implementation O(n) complexity 16 / 50
17 Contributions New: Workstealing algorithm Runtime implementation Fine grain events: Algorithm: steal events with high execution time Inefficient cache usage: Algorithm: steal cache-friendly events Algorithm: take cache hierarchy into account Inefficient workstealing implementation Runtime: mitigate stealing costs 17 / 50
18 Idea #1: Take into account execution time Problem: stealing cost is not always amortized Many event handlers are relatively fine grain Workstealing may have a significant cost Solution: Time-left stealing Know at any time which colors are worthy (Handler execution time is set by the programmer) 18 / 50
19 Idea #2: Take into account caches Problem: Workstealing can reduce cache efficiency Stealing events increases cache misses Example: event handlers accessing large, long-lived, data sets Solution 1: Penalty-aware stealing Set penalties on handlers based on their cache access pattern (Penalties are set manually based on preliminary profiling) Solution 2: Locality-aware stealing Give priority to a neighbor when stealing 19 / 50
20 Runtime implementation Core X color-queue Control loop Color 0 Color 1 Color 2 Color 3 core-queue stealing-queue One color-queue per color One core-queue per core that links color-queues One stealing-queue per core 20 / 50
21 Performance evaluation: SFS Libasync-smp Libasync-smp - WS Mely - WS Throughput (MB/sec) No throughput degradation 21 / 50
22 Performance evaluation: Web server Libasync-smp Libasync-smp - WS Mely - WS Throughput (MB/sec) % throughput improvement 22 / 50
23 Web server profiling Web server configuration Stealing time Stolen time Cache misses/event Libasync-SMP - WS 197 Kcycles 20 Kcycles 21 Mely - WS 6 Kcycles 23 Kcycles 9 Stealing time (6 Kcycles) < stolen processing time (23 Kcycles) Improved cache efficiency: -57% L2 cache misses 23 / 50
24 Summary Goal: efficient runtime for multicore event-driven systems Problem: workstealing sometimes degrades performance Contributions: New workstealing algorithm New runtime implementation Results: improve throughput by up to 73% 24 / 50
25 #2: Scaling the Apache Web server on NUMA multicore systems Under submission 25 / 50
26 Problem % # of clients per die Ideal scalability Apache # of dies The Apache web server do not scale on NUMA architectures 26 / 50
27 What can we do? Address scalability issues at the OS level Corey (OSDI 08) Barrelfish (SOSP 09) PK (OSDI 10) 27 / 50
28 Apache on PK % # of clients per die Ideal scalability Apache on PK Apache # of dies Does not solve scalability issues 28 / 50
29 What do we propose? Addressing scalability issues at the OS level is not sufficient Application-level issues Some issues are difficult to handle (e.g. scheduling) Approach: address scalability issues at the application level 29 / 50
30 Methodology Consider both hardware and software bottlenecks Hardware bottlenecks: Processor interconnect Distant memory accesses Software bottlenecks: Synchronization primitives 30 / 50
31 Hardware testbed 61 Gb/s C0 C4 C8 C12 C3 C7 C11 C15 I/O L1 L1 L1 L1 24 Gb/s L2 L2 L2 L2 L3 Die 0 Die 1 Die 2 C2 C6 C10 C14 Die 3 C1 C5 C9 C13 I/O 24 Gb/s 4 processors / 16 cores 31 / 50
32 Hardware testbed 61 Gb/s C0 C4 C8 C12 C3 C7 C11 C15 I/O L1 L1 L1 L1 24 Gb/s L2 L2 L2 L2 L3 Die 0 Die 1 Die 2 C2 C6 C10 C14 Die 3 C1 C5 C9 C13 I/O 24 Gb/s 4 processors / 16 cores 31 / 50
33 Hardware bottlenecks Memory efficiency (IPC) Configuration Average IPC 1 die dies % IPC decrease 32 / 50
34 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency Configuration L3 cache miss ratio (%) 1 die 14 4 dies / 50
35 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency HyperTransport link saturation Configuration Max HT usage (%) 1 die 25 4 dies / 50
36 Hardware bottlenecks (2) IPC decrease: Reduced cache efficiency HyperTransport link saturation Increased number of distant memory accesses Configuration Distant accesses/kb 1 die 4 4 dies / 50
37 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC Request Receiving a TCP request 34 / 50
38 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC HTTP request processing 34 / 50
39 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC PHP processing 34 / 50
40 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 C10 NIC Sending the response (1) 34 / 50
41 Request processing C0 C5 Die 0 Die 1 Die 2 Die 3 Reply C10 NIC Sending the response (2) 34 / 50
42 Proposal #1 Solution: co-localizing TCP, Apache and PHP processing Implementation: use one instance of the Apache/PHP stack per die (N-Copy) One node manages 5 network interfaces 35 / 50
43 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 NIC Request Receiving a TCP request 36 / 50
44 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC HTTP request processing 36 / 50
45 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC PHP processing 36 / 50
46 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 C2 C6 C10 NIC Sending the response (1) 36 / 50
47 N-Copy: request processing Die 0 Die 1 Die 2 Die 3 Reply C2 C6 C10 NIC Sending the response (2) 36 / 50
48 N-Copy: performance % # of clients per die Ideal scalability Apache NCopy Apache # of dies 9.1% performance improvement compared to stock Apache 37 / 50
49 N-Copy: performance (2) Configuration Average IPC Distant accesses/kb 1 die dies (Stock Apache) dies (N-Copy) Memory efficiency improved by 20% 38 / 50
50 N-Copy: can we do better? Die Average CPU usage Die Die 1 85 Die 2 85 Die Problem: Dies are not equally efficient Load is not properly balanced on dies 39 / 50
51 N-Copy: load balancing Solution: balance load on dies proportionally to their efficiency Implementation: use an external load balancing mechanism Currently implemented at client-side Could be integrated in a more global solution 40 / 50
52 N-Copy: final performance % # of clients per die Ideal scalability Apache NCopy LB Apache NCopy Apache # of dies 21.2% performance improvement compared to stock Apache 41 / 50
53 Software bottlenecks Goal: find functions that Do not scale Represent a significant execution time Example: Function f accounts for 1 cycle/byte at 1 die 10 cycles/byte at 4 dies 20% of the total execution time 18% potential performance gain 42 / 50
54 Software bottlenecks (2) Function Potential performance gain (%) d lookup 2.49% atomic dec and lock 2.32% lookup mnt 1.41% copy user generic string 0.83% memcpy 0.76% Problem: the VFS layer does not scale Aggregated potential performance gain: 6 % Most of the calls are issued by the stat function 43 / 50
55 Proposal #2 Solution: use an application-level cache to reduce the number of calls to stat Implementation: Modified the Apache ap directory walk function Using inotify for file updates 44 / 50
56 Stat cache: performance % # of clients per die Ideal scalability 2000 Apache with all optims Apache NCopy LB Apache NCopy Apache # of dies 33% performance improvement compared to stock Apache 45 / 50
57 Summary Problem: Apache does not scale on NUMA architectures Contribution: application-level optimizations considering NUMA aspects and Linux scalability issues Results: +33% performance improvement 46 / 50
58 Conclusion 47 / 50
59 Conclusion Application domain: data servers Goal: Improve the performance of data servers on multicore architectures Contributions: Efficient multicore event-driven programming Scaling the Apache Web server on NUMA multicore systems 48 / 50
60 Future work Short term: Workstealing: automate profiling and decisions Apache: study other workloads Long term: Study the impact of distant memory accesses on other servers Study the impact of programming models on multicore performance Study the scalability of the Java virtual machine 49 / 50
61 Questions? 50 / 50
62 Backup Slides 2 / 50
63 Web server Returns static page content (1KB files requested) Closed-loop injection 5 load injectors simulating between 200 and 2000 clients Architecture is based on legacy design Per-connection coloring RegisterFd InEpoll Close Dec Accepted Clients Accept Read Request Parse Request GetFrom Cache Write Response Epoll 3 / 50
64 Web server evaluation Throughput (KRequests/s) Mely - WS Libasync-smp Mely Libasync-smp - WS Number of Clients Up to 73% improvement over the Libasync-SMP workstealing mechanism 4 / 50
65 Mely - Other web server evaluation (2) 200 Mely - WS Userver Apache Throughput (KRequests/s) Number of Clients Performance better than other real world Web servers 5 / 50
66 Apache Workload description SPECWeb2005 Support benchmark Vendor site Mostly static / PHP for dynamic pages Back-end Simulator (BeSim) Closed-loop injection with think times Defined QoS: 99% of clients served within 5s 95% of clients served within 3s Throughput constraints Modified to fit in main memory: 12GB 6 / 50
67 Software configuration Apache Worker version using both threads and processes Sendfile enabled to improve performance PHP Tuned number of PHP processes With eaccelerator Linux NUMA support IRQ processing manually balanced Responsible for dispatching thread and processes 7 / 50
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
More informationThe Advantages of Network Interface Card (NIC) Web Server Configuration Strategy
Improving the Scalability of a Multi-core Web Server Raoufehsadat Hashemian ECE Department, University of Calgary, Canada rhashem@ucalgary.ca ABSTRACT Martin Arlitt Sustainable Ecosystems Research Group
More informationApplication-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study
Application-Level Optimizations on NUMA Multicore Architectures: the Apache Case Study Fabien Gaud INRIA abien.gaud@inria.r Gilles Muller INRIA gilles.muller@inria.r Renaud Lachaize Grenoble University
More informationDatacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
More informationApplication Performance Testing Basics
Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationExploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
More informationDelivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
More informationWeb Server Architectures
Web Server Architectures CS 4244: Internet Programming Dr. Eli Tilevich Based on Flash: An Efficient and Portable Web Server, Vivek S. Pai, Peter Druschel, and Willy Zwaenepoel, 1999 Annual Usenix Technical
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationFull and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
More informationMEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationVirtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:
More informationManaged Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures
Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures Ada Gavrilovska Karsten Schwan, Mukil Kesavan Sanjay Kumar, Ripal Nathuji, Adit Ranadive Center for Experimental
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationResource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
More informationEvaluating and Comparing the Impact of Software Faults on Web Servers
Evaluating and Comparing the Impact of Software Faults on Web Servers April 2010, João Durães, Henrique Madeira CISUC, Department of Informatics Engineering University of Coimbra {naaliel, jduraes, henrique}@dei.uc.pt
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationPerformance Tuning and Optimizing SQL Databases 2016
Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationTomcat Tuning. Mark Thomas April 2009
Tomcat Tuning Mark Thomas April 2009 Who am I? Apache Tomcat committer Resolved 1,500+ Tomcat bugs Apache Tomcat PMC member Member of the Apache Software Foundation Member of the ASF security committee
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationPerformance Management for Cloudbased STC 2012
Performance Management for Cloudbased Applications STC 2012 1 Agenda Context Problem Statement Cloud Architecture Need for Performance in Cloud Performance Challenges in Cloud Generic IaaS / PaaS / SaaS
More informationParallel Processing and Software Performance. Lukáš Marek
Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel
More informationEstimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
More informationECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
More informationAgility Database Scalability Testing
Agility Database Scalability Testing V1.6 November 11, 2012 Prepared by on behalf of Table of Contents 1 Introduction... 4 1.1 Brief... 4 2 Scope... 5 3 Test Approach... 6 4 Test environment setup... 7
More informationPerformance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server
Performance brief for IBM WebSphere Application Server.0 with VMware ESX.0 on HP ProLiant DL0 G server Table of contents Executive summary... WebSphere test configuration... Server information... WebSphere
More informationMicrokernels, virtualization, exokernels. Tutorial 1 CSC469
Microkernels, virtualization, exokernels Tutorial 1 CSC469 Monolithic kernel vs Microkernel Monolithic OS kernel Application VFS System call User mode What was the main idea? What were the problems? IPC,
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationAn Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationEloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
More informationTCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationComparing the Performance of Web Server Architectures
Comparing the Performance of Web Server Architectures David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario,
More informationPARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationDeduplication in VM Environments Frank Bellosa <bellosa@kit.edu> Konrad Miller <miller@kit.edu> Marc Rittinghaus <rittinghaus@kit.
Deduplication in VM Environments Frank Bellosa Konrad Miller Marc Rittinghaus KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - SYSTEM ARCHITECTURE GROUP
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationDynamic Resource allocation in Cloud
Dynamic Resource allocation in Cloud ABSTRACT: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from
More informationBENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES
BENCHMARKING EFFORT OF VIRTUAL MACHINES ON MULTICORE MACHINES Aparna Venkatraman, Vinay Pandey, Beth Plale, and Shing-Shong Shei Computer Science Dept., Indiana University {apvenkat, vkpandey, plale, shei}
More informationIntroduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7
Introduction 1 Performance on Hosted Server 1 Figure 1: Real World Performance 1 Benchmarks 2 System configuration used for benchmarks 2 Figure 2a: New tickets per minute on E5440 processors 3 Figure 2b:
More informationWeb Application s Performance Testing
Web Application s Performance Testing B. Election Reddy (07305054) Guided by N. L. Sarda April 13, 2008 1 Contents 1 Introduction 4 2 Objectives 4 3 Performance Indicators 5 4 Types of Performance Testing
More informationPerformance Analysis of Web based Applications on Single and Multi Core Servers
Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department
More informationJBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell
More informationHow System Settings Impact PCIe SSD Performance
How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage
More informationAnalysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers
Analysis of Delivery of Web Contents for Kernel-mode and User-mode Web Servers Syed Mutahar Aaqib Research Scholar Department of Computer Science & IT IT University of Jammu Lalitsen Sharma Associate Professor
More informationVirtualization for Future Internet
Virtualization for Future Internet 2010.02.23 Korea University Chuck Yoo (hxy@os.korea.ac.kr) Why Virtualization Internet today Pro and con Your wonderful research results Mostly with simulation Deployment
More informationDesign and Performance of a Web Server Accelerator
Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias IBM Research T. J. Watson Research Center P. O. Box 74 Yorktown Heights, NY 598 Abstract
More informationSoftware and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
More informationCS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
More informationHow To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power
A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationTempesta FW. Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com
Tempesta FW Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com What Tempesta FW Is? FireWall: layer 3 (IP) layer 7 (HTTP) filter FrameWork: high performance and flexible platform to build intelligent
More informationHigh-Density Network Flow Monitoring
Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationTyche: An efficient Ethernet-based protocol for converged networked storage
Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June
More informationRecommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
More informationHRG Assessment: Stratus everrun Enterprise
HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at
More informationDeep Dive: Maximizing EC2 & EBS Performance
Deep Dive: Maximizing EC2 & EBS Performance Tom Maddox, Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved What we ll cover Amazon EBS overview Volumes Snapshots
More informationTechnical Paper. Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage
Technical Paper Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage Release Information Content Version: 1.0 May 2014. Trademarks and Patents SAS Institute Inc., SAS Campus
More informationRAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29
RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant
More informationCON9577 Performance Optimizations for Cloud Infrastructure as a Service
CON9577 Performance Optimizations for Cloud Infrastructure as a Service John Falkenthal, Software Development Sr. Director - Oracle VM SPARC and VirtualBox Jeff Savit, Senior Principal Technical Product
More informationPractical Performance Understanding the Performance of Your Application
Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance
More informationComputing at the HL-LHC
Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,
More informationDeploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure
Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure Justin Venezia Senior Solution Architect Paul Pindell Senior Solution Architect Contents The Challenge 3 What is a hyper-converged
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationOracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
More informationSo#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
More informationMigration Scenario: Migrating Batch Processes to the AWS Cloud
Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationSQL Server 2008 Performance and Scale
SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement
More informationCIT 668: System Architecture. Performance Testing
CIT 668: System Architecture Performance Testing Topics 1. What is performance testing? 2. Performance-testing activities 3. UNIX monitoring tools What is performance testing? Performance testing is a
More informationThe Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
More informationAgenda. Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat
Agenda Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat Tomcat Performance Tuning Tomcat Versions Application/System
More informationCloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 Michael Faden Partner Technology Advisor Microsoft Schweiz 1 Beyond Virtualization virtualization The power of many servers, the simplicity of one Every app,
More informationMicrosoft SQL Server: MS-10980 Performance Tuning and Optimization Digital
coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and
More informationApache Tomcat Tuning for Production
Apache Tomcat Tuning for Production Filip Hanik & Mark Thomas SpringSource September 2008 Copyright 2008 SpringSource. Copying, publishing or distributing without express written permission is prohibited.
More informationAn Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
More informationManaging Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction
Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Cristina Silvano cristina.silvano@polimi.it Politecnico di Milano HiPEAC CSW Athens 2014 Motivations System
More informationArchitecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationDesign Issues in a Bare PC Web Server
Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York
More informationNext Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
More informationCrystal Reports Server 2008
Revision Date: July 2009 Crystal Reports Server 2008 Sizing Guide Overview Crystal Reports Server system sizing involves the process of determining how many resources are required to support a given workload.
More informationControl 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
More informationResource Containers: A new facility for resource management in server systems
CS 5204 Operating Systems Resource Containers: A new facility for resource management in server systems G. Banga, P. Druschel, Rice Univ. J. C. Mogul, Compaq OSDI 1999 Outline Background Previous Approaches
More informationFabrics that Fit Matching the Network to Today s Data Center Traffic Conditions
Sponsored by Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions In This Paper Traditional network infrastructures are often costly and hard to administer Today s workloads
More informationCOLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationOracle Net Services - Best Practices for Database Performance and Scalability
Oracle Net Services - Best Practices for Database Performance and Scalability Keywords: Kuassi Mensah Oracle Corporation USA Net Manager, NetCA, Net Listener, IPv6, TCP, TNS, LDAP, Logon Storm. Introduction
More informationXTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5
More informationImplementing Probes for J2EE Cluster Monitoring
Implementing s for J2EE Cluster Monitoring Emmanuel Cecchet, Hazem Elmeleegy, Oussama Layaida, Vivien Quéma LSR-IMAG Laboratory (CNRS, INPG, UJF) - INRIA INRIA Rhône-Alpes, 655 av. de l Europe, 38334 Saint-Ismier
More informationMOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm
More informationPerformance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
More information