How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

Size: px
Start display at page:

Download "How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power"

Transcription

1 A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015

2 1. Context/Motivations 2. Fast presentation of the tool 3. Demonstration 4. How does it works? 5. How is it made? 6. Features & Future Works TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015

3 MOTIVATIONS Memory hierarchy is growing deeper and larger. No performance without a fair usage of the system topology Batch schedulers, runtimes, applications themeselves... are getting topology aware. Processing Unit NUMA Shared Memory Shared Cache IO Network Machine NUMA Shared Memory Shared Cache Topology Aware Performance Monitoring August 24,

4 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache Processing Unit Topology Aware Performance Monitoring August 24,

5 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache Processing Unit Topology Aware Performance Monitoring August 24,

6 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache It is a multi-level and a multi-criteria problem. Processing Unit Topology Aware Performance Monitoring August 24,

7 MOTIVATIONS Need to match use cases, and relevant performance metrics for each level. Need to match performance and topology. Requires topology modeling skills. Requires adaptable performance monitoring. Topology Aware Performance Monitoring August 24,

8 Yet Another Tool to Monitor Applications Performance Focus on data presentation to link the results with topology informations. Relies on two cornerstones of topology modeling (hwloc) and performance counter abstraction (PAPI) to map the latter on the former Minimal configuration and software requirements. Can help finding and caracterizing localized bottlenecks. Topology Aware Performance Monitoring August 24,

9 Hardware Locality (hwloc) Portable abstraction of hierarchical architectures for high-performance computing Performs topology discovery and extracts hardware component information. Provides tools for memory and process binding. Many operating systems supported... lstopo utility to display the topology: Developped at Inria Bordeaux. Topology Aware Performance Monitoring August 24,

10 Hardware Locality (hwloc) Machine (31GB total) NUMANode P#0 (31GB) Package P#0 L3 (20MB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L1d L1d L1d L1d L1d L1d L1d L1d L1i L1i L1i L1i L1i L1i L1i L1i P#0 P#1 P#2 P#3 P#4 P#5 P#6 P#7 P#0 P#2 P#4 P#6 P#8 P#10 P#12 P#14 P#16 P#18 P#20 P#22 P#24 P#26 P#28 P#30 Topology Aware Performance Monitoring August 24,

11 Performance Application Programming Interface (PAPI) Consistent interface and methodology for use of the performance counter hardware. Real time relation between software performance and processor events. Many operating systems supported too. Reliable and actively supported. Used in a wide range of performance analysis applications. An abstraction layer to plug some other performance library is under development. Topology Aware Performance Monitoring August 24,

12 Dynamic Lstopo (example) Machine (16GB total) NUMANode P#0 (16GB) Package P#0 L3 9, e+01 (20MB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 5, e+02 (256KB) L2 2, e+02 (256KB) L2 3, e+02 (256KB) L1d 6, e+02 L1d 6, e+02 L1d 6, e+02 L1d 6, e+02 L1d 7, e+02 L1d 1, e+03 L1d 6, e+02 L1d 6, e+02 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 2, e+03 L1i 1, e+03 L1i 1, e+03 P#0 P#1 P#2 P#3 P#4 P#5 P#6 P#7 2,82176e+00 P#0 2,45916e+00 P#2 2,68772e+00 P#4 1,51689e+00 P#6 1,63417e+00 P#8 3,47249e+00 P#10 1,47808e+00 P#12 1,51514e+00 P#14 1,40868e+00 P#16 1,47441e+00 P#18 1,52142e+00 P#20 2,64271e+00 P#22 2,82472e+00 P#24 1,47165e+00 P#26 2,58947e+00 P#28 2,49344e+00 P#30 Sample of hardware performance counters mapped on a single socket of an Intel Xeon E C. Topology Aware Performance Monitoring August 24,

13 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,

14 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,

15 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,

16 A Demonstration Worth Thousand Words L3_MISS{ OBJ = L3; CTR = PAPI_L3_TCM; LOGSCALE = 1; } L2_MISS{ OBJ = L2; CTR = PAPI_L2_TCM; LOGSCALE = 1; } L1_MISS{ OBJ = L1d; CTR = PAPI_L1_DCM; LOGSCALE = 1; } SINGLE_L3_MISS{ OBJ = ; CTR = PAPI_L3_TCM; LOGSCALE = 1; } Topology Aware Performance Monitoring August 24,

17 Dynamic Lstopo (Usage) Machine (16GB total) Counters input: NUMANode P#0 (16GB) Package P#0 L3 2, e+03 (4096KB) L2 (256KB) L1d L1i L2 (256KB) L1d L1i SINGLE L3 MISS{ OBJ = L3 ; CTR = PAPI L2 TCM/PAPI L2 TCA ; LOGSCALE = 1 ; MAX= ; MIN=0; } P#0 P#1 P#0 P#1 P#2 P#3 Command line: lstopo perf-input counters input Topology Aware Performance Monitoring August 24,

18 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 L1 + L1 + + Topology Aware Performance Monitoring August 24,

19 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 2. For each timestamp, each thread collects a local set of performance counters. L1 + L1 + + Topology Aware Performance Monitoring August 24,

20 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 2. For each timestamp, each thread collects a local set of performance counters. L1 + L1 3. Counters are accumulated in each upper level. + + Topology Aware Performance Monitoring August 24,

21 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 % 2. For each timestamp, each thread collects a local set of performance counters. % L1 + L1 % 3. Counters are accumulated in each upper level. % + % % + % 4. For each level, a leaf computes an arithmetic expression of the performance counters in the set. Topology Aware Performance Monitoring August 24,

22 Dynamic Lstopo Software Architecture in Brief lstopo utility Monitors output Application monitors library hwloc library PAPI library Machine static topology Machine dynamic performance counters Topology Aware Performance Monitoring August 24,

23 Dynamic Lstopo Software Architecture in Brief lstopo utility Monitors output Application monitors library hwloc library PAPI library Machine static topology Machine dynamic performance counters Topology Aware Performance Monitoring August 24,

24 API m o n i t o r s = l o a d M o n i t o r s f r o m c o n f i g (NULL, m y p e r f f i l e, m y o u t p u t f i l e, 0) ; M o n i t o r s w a t c h p i d ( monitors, g e t p i d ( ) ) ; M o n i t o r s s t a r t ( m o n i t o r s ) ; /... / M o n i t o r s u p d a t e c o u n t e r s ( m o n i t o r s ) ; d e l e t e M o n i t o r s ( m o n i t o r s ) ; Topology Aware Performance Monitoring August 24,

25 Dynamic Lstopo (Output paje trace) % EventDef val 0 % Id int % Phase int % Time_us date % Value double % EndEventDef % EventDef container 1 % Id int % Level string % Sibling int % Name string % Logscale int % EndEventDef SINGLE_ L3_ MISS SINGLE_ L3_ MISS SINGLE_ L3_ MISS SINGLE_ L3_ MISS , , , , , , , ,00 Topology Aware Performance Monitoring August 24,

26 Features Record and/or Display live machine performance counters and match them with topology. Several settings: counters accumulation, sampling rate, attach to a process... Replay any trace with a topology file (for external display).. Sample specific parts of an application with the monitor library. Support legacy lstopo options (restrict topology, change display format... ). Topology Aware Performance Monitoring August 24,

27 Future works Match code and performance informations Accept user defined aggregation operator. Provide performance abstraction layer. Be able to delimit phases during execution. Find and give explicit hints on bottlenecks. Topology Aware Performance Monitoring August 24,

28 Conclusion Data locality becomes a main criterion for high performance. We built a tool based on a topology model and a performance library to help taking up the challenge. It maps performance values to machine objects. It is a visual tool, fast and easy to use. It is lightweight and causes less than 1% C overhead. Let you build topology aware performance models. Dynamic lstopo is into the process of beeing merged with hwloc project. Topology Aware Performance Monitoring August 24,

29 Now available from lstopo Thank you Topology Aware Performance Monitoring August 24,

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques.

Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques. Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques Brice Goglin 15 avril 2014 Towards generic Communication Mechanisms

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

ELEC 377. Operating Systems. Week 1 Class 3

ELEC 377. Operating Systems. Week 1 Class 3 Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation

More information

Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop

More information

PSE Molekulardynamik

PSE Molekulardynamik OpenMP, bigger Applications 12.12.2014 Outline Schedule Presentations: Worksheet 4 OpenMP Multicore Architectures Membrane, Crystallization Preparation: Worksheet 5 2 Schedule 10.10.2014 Intro 1 WS 24.10.2014

More information

Performance Characteristics of Large SMP Machines

Performance Characteristics of Large SMP Machines Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA Tools For Profiling And Monitoring. David Goodwin NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale

More information

Performance Application Programming Interface

Performance Application Programming Interface /************************************************************************************ ** Notes on Performance Application Programming Interface ** ** Intended audience: Those who would like to learn more

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

End-user Tools for Application Performance Analysis Using Hardware Counters

End-user Tools for Application Performance Analysis Using Hardware Counters 1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in

More information

Optimization tools. 1) Improving Overall I/O

Optimization tools. 1) Improving Overall I/O Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the

More information

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Improving the performance of data servers on multicore architectures. Fabien Gaud

Improving the performance of data servers on multicore architectures. Fabien Gaud Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa Introduction Problem Recent studies into the effects of memory

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Resource Aware Scheduler for Storm. Software Design Document. <jerry.boyang.peng@gmail.com> Date: 09/18/2015

Resource Aware Scheduler for Storm. Software Design Document. <jerry.boyang.peng@gmail.com> Date: 09/18/2015 Resource Aware Scheduler for Storm Software Design Document Author: Boyang Jerry Peng Date: 09/18/2015 Table of Contents 1. INTRODUCTION 3 1.1. USING

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt

PAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt 1 PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

Ressources management and runtime environments in the exascale computing era

Ressources management and runtime environments in the exascale computing era Ressources management and runtime environments in the exascale computing era Guillaume Huard MOAIS and MESCAL INRIA Projects CNRS LIG Laboratory Grenoble University, France Guillaume Huard MOAIS and MESCAL

More information

Scalability evaluation of barrier algorithms for OpenMP

Scalability evaluation of barrier algorithms for OpenMP Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Very Large Enterprise Network, Deployment, 25000+ Users

Very Large Enterprise Network, Deployment, 25000+ Users Very Large Enterprise Network, Deployment, 25000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11 Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such

More information

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended

More information

OpenMP and Performance

OpenMP and Performance Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an

More information

CSCI E 98: Managed Environments for the Execution of Programs

CSCI E 98: Managed Environments for the Execution of Programs CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office

More information

ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs

ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs ABSTRACT Xu Liu Department of Computer Science College of William and Mary Williamsburg, VA 23185 xl10@cs.wm.edu It is

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations

A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations Aurélien Esnard, Nicolas Richart and Olivier Coulaud ACI GRID (French Ministry of Research Initiative) ScAlApplix

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Practical Performance Understanding the Performance of Your Application

Practical Performance Understanding the Performance of Your Application Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft Foreword p. xv Preface p. xvii The team that wrote this redbook p. xviii Comments welcome p. xx Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft Technologies

More information

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com info@veritest.com Microsoft Windows

More information

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0 D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline

More information

Understand Performance Monitoring

Understand Performance Monitoring Understand Performance Monitoring Lesson Overview In this lesson, you will learn: Performance monitoring methods Monitor specific system activities Create a Data Collector Set View diagnosis reports Task

More information

SIDN Server Measurements

SIDN Server Measurements SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources

More information

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Database Application Developer Tools Using Static Analysis and Dynamic Profiling Database Application Developer Tools Using Static Analysis and Dynamic Profiling Surajit Chaudhuri, Vivek Narasayya, Manoj Syamala Microsoft Research {surajitc,viveknar,manojsy}@microsoft.com Abstract

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Optimizing the Performance of Your Longview Application

Optimizing the Performance of Your Longview Application Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2.

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2. IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft Applications:

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

Implementing Probes for J2EE Cluster Monitoring

Implementing Probes for J2EE Cluster Monitoring Implementing s for J2EE Cluster Monitoring Emmanuel Cecchet, Hazem Elmeleegy, Oussama Layaida, Vivien Quéma LSR-IMAG Laboratory (CNRS, INPG, UJF) - INRIA INRIA Rhône-Alpes, 655 av. de l Europe, 38334 Saint-Ismier

More information

Data Structure Oriented Monitoring for OpenMP Programs

Data Structure Oriented Monitoring for OpenMP Programs A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program

More information

11.1 inspectit. 11.1. inspectit

11.1 inspectit. 11.1. inspectit 11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.

More information

x64 Servers: Do you want 64 or 32 bit apps with that server?

x64 Servers: Do you want 64 or 32 bit apps with that server? TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called

More information

Chapter 14 Virtual Machines

Chapter 14 Virtual Machines Operating Systems: Internals and Design Principles Chapter 14 Virtual Machines Eighth Edition By William Stallings Virtual Machines (VM) Virtualization technology enables a single PC or server to simultaneously

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc. Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information

Analysis of VDI Storage Performance During Bootstorm

Analysis of VDI Storage Performance During Bootstorm Analysis of VDI Storage Performance During Bootstorm Introduction Virtual desktops are gaining popularity as a more cost effective and more easily serviceable solution. The most resource-dependent process

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

Cloud Computing Simulation Using CloudSim

Cloud Computing Simulation Using CloudSim Cloud Computing Simulation Using CloudSim Ranjan Kumar #1, G.Sahoo *2 # Assistant Professor, Computer Science & Engineering, Ranchi University, India Professor & Head, Information Technology, Birla Institute

More information

Scalability Factors of JMeter In Performance Testing Projects

Scalability Factors of JMeter In Performance Testing Projects Scalability Factors of JMeter In Performance Testing Projects Title Scalability Factors for JMeter In Performance Testing Projects Conference STEP-IN Conference Performance Testing 2008, PUNE Author(s)

More information

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies Oracle Enterprise Manager 12c New Capabilities for the DBA Charlie Garry, Director, Product Management Oracle Server Technologies of DBAs admit doing nothing to address performance issues CHANGE AVOID

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

Using Library Dependencies for Clustering

Using Library Dependencies for Clustering Using Library Dependencies for Clustering Jochen Quante Software Engineering Group, FB03 Informatik, Universität Bremen quante@informatik.uni-bremen.de Abstract: Software clustering is an established approach

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Informatica Ultra Messaging SMX Shared-Memory Transport

Informatica Ultra Messaging SMX Shared-Memory Transport White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade

More information

Very Large Enterprise Network Deployment, 25,000+ Users

Very Large Enterprise Network Deployment, 25,000+ Users Very Large Enterprise Network Deployment, 25,000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering

More information

Lab Validation Report

Lab Validation Report Lab Validation Report Workload Performance Analysis: Microsoft Windows Server 2012 with Hyper-V and SQL Server 2012 Virtualizing Tier-1 Application Workloads with Confidence By Mike Leone, ESG Lab Engineer

More information

HPSA Agent Characterization

HPSA Agent Characterization HPSA Agent Characterization Product HP Server Automation (SA) Functional Area Managed Server Agent Release 9.0 Page 1 HPSA Agent Characterization Quick Links High-Level Agent Characterization Summary...

More information

Microsoft SQL Server OLTP Best Practice

Microsoft SQL Server OLTP Best Practice Microsoft SQL Server OLTP Best Practice The document Introduction to Transactional (OLTP) Load Testing for all Databases provides a general overview on the HammerDB OLTP workload and the document Microsoft

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Chapter 5 Linux Load Balancing Mechanisms

Chapter 5 Linux Load Balancing Mechanisms Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still

More information

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Research Report The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Executive Summary Information technology (IT) executives should be

More information

A Locality Approach to Architecture-aware Task-scheduling in OpenMP

A Locality Approach to Architecture-aware Task-scheduling in OpenMP A Locality Approach to Architecture-aware Task-scheduling in OpenMP Ananya Muddukrishna ananya@kth.se Mats Brorsson matsbror@kth.se Vladimir Vlassov vladv@kth.se ABSTRACT Multicore and other parallel computer

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing

More information

vsphere Resource Management Guide

vsphere Resource Management Guide ESX 4.0 ESXi 4.0 vcenter Server 4.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent

More information

SQL Server 2008 Performance and Scale

SQL Server 2008 Performance and Scale SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement

More information

vsphere Resource Management

vsphere Resource Management ESXi 5.1 vcenter Server 5.1 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions

More information

Replication on Virtual Machines

Replication on Virtual Machines Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism

More information