Combining Instrumentation and Sampling for Trace-based Application Performance Analysis

Size: px
Start display at page:

Download "Combining Instrumentation and Sampling for Trace-based Application Performance Analysis"

Transcription

1 Center for Information Services and High Performance Computing (ZIH) Combining Instrumentation and Sampling for Trace-based Application Performance Analysis 8th International Parallel Tools Workshop Stuttgart, Germany, October 2, 2014 Thomas Ilsche Joseph Schuchart Robert Schöne Daniel Hackenberg

2 Introduction Looking at the landscape of performance analysis tools Identify established techniques Provide a structured overview Highlight strengths and weaknesses Identify novel combinations Combine strengths Mitigate weaknesses Look beyond the traditional fields of tools Thomas Ilsche 4

3 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 5

4 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 6

5 Data Acquisition: Event-based Instrumentation time bar foo main bar Measurement environment Event-based instrumentation; also: direct instrumentation, event trigger, probebased measurement or simply instrumentation. Modification of the application execution in order to record and present certain intrinsic events of the application execution, e.g., function entry and exit events. Thomas Ilsche 7

6 Data Acquisition: Event-based Instrumentation Overhead & perturbation depends on function call rate Hard to predict in complex applications Can be influenced by filtering function calls Preferably statically, not during runtime Complete information Accurate function call counts Message properties (semantics of function call arguments) Analysis tools may rely on completeness Thomas Ilsche 8

7 Data Acquisition: Event-based Instrumentation Various instrumentation methods available Compiler instrumentation * Library wrapping ** Source code transformation * Manual instrumentation * Binary instrumentation * Requires recompilation & separate performance measurement binary ** Requires relinking for statically linked binaries Thomas Ilsche 9

8 Data Acquisition: Sampling time 200us 400us 600us 800us bar foo main bar Measurement environment Sampling; also: statistical sampling or (ambiguously) profiling. Periodic interruption of a running program and inspection of its state. Thomas Ilsche 10

9 Data Acquisition: Sampling Overhead & perturbation depends on sampling rate Can be predicted Can be controlled Stack unwinding introduce uncertainty Easy to use (for end users) No recompilation or relinking necessary No filtering necessary Thomas Ilsche 11

10 Data Acquisition: Sampling Incomplete information No accurate function call counts No specific message properties or other semantics of function arguments Measurement has statistical value More reliable for longer running experiments Trade-off between accuracy and perturbation via sampling rate Thomas Ilsche 12

11 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 13

12 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main Thomas Ilsche 14

13 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main Thomas Ilsche 15

14 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Thomas Ilsche 16

15 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Sampling time_ex[bar] += 200 time_ex[foo] += 200 time_ex[foo] += 200 time_ex[bar] += 200 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Thomas Ilsche 17

16 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Sampling time_ex[bar] += 200 time_ex[foo] += 200 time_ex[foo] += 200 time_ex[bar] += 200 Loses information Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Requires memory at runtime Thomas Ilsche 18

17 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 19

18 Data Presentation Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name open offtime memccpy write Example profile (gprof) Can be generated by summarization, but also from logging Example timeline showing call-path and event annotations (Vampir) Needs logging during recording Thomas Ilsche 20

19 Classification of performance analysis techniques Profiling Tracing Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 21

20 Tools Profiling Scalasca TAU Tracing VampirTrace Event-based Instrumentation Sampling gprof Score-P perf HPCToolkit Extrae Example Concepts Allinea MAP Thomas Ilsche 25

21 Combining Performance Analysis Techniques (1) C++ Graph code INDDGO OpenMP, 4 Threads Uninstrumented :< 6 seconds Instrumented (profiling): 72 seconds 1100% overhead! A trace file would be ~3.8 GB with even more overhead Thomas Ilsche 26

22 Combining Performance Analysis Techniques (1) Estimated aggregate size of event trace: 3851MB Estimated requirements for largest trace buffer (max_buf): 3851MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 3860MB (hint: When tracing set SCOREP_TOTAL_MEMORY=3860MB to avoid intermediate flushes or reduce requirements using USR regions filters.) type max_buf[b] visits time[s] region ALL 4,038,048, ,849, ALL USR 4,038,047, ,849, USR OMP OMP COM COM 72 functions with > 1 million visits USR 365,389,440 14,053, Graph::lcgrand(int) USR 322,737,636 12,412, std::_list_iterator<int>::operator*() const USR 208,735,202 8,028, std::_list_iterator<int>::operator++() USR 201,389,266 7,745, std::_list_iterator<int>::_list_iterator USR 200,350,128 12,521, std::_list_iterator<int>::operator!= USR 1,040,000 40, Graph::Node* std:: addressof Thomas Ilsche 27

23 Combining Performance Analysis Techniques (1) MPI instrumentation For messages, complete information is very important during analysis MPI functions generally imply a certain minimum load Lower relative overhead expected compared to very short computation functions Call-path sampling For function execution statistical information may be sufficient Controlling the overhead of compiler instrumentation via filtering is not straightforward Tracing (Logging Timelines) Prototype in VampirTrace MPI: Traditional library interposition with PMPI Sampling: Performance counter based interrupt (e.g. every 1 million cycles) Thomas Ilsche 28

24 Combining Performance Analysis Techniques (1) Example with NPB-BT, 1 sample every 1 million cycles (~385 us) Thomas Ilsche 29

25 Combining Performance Analysis Techniques (1) MPI Instrumentation and call-path sampling Example with NPB-BT Runtime filter: matmul_sub, matvec_sub, binvrhs, binvcrhs, lhsinit, exact_solution Filter helps with overhead, but no more information about those functions ** Sampling rate of 2.6 ksa/s. Thomas Ilsche 30

26 Combining Performance Analysis Techniques (2) MPI and compiler instrumentation and performance counter sampling Tracing (Logging Timelines) Implemented as VampirTrace/Score-P metrics plugin MPI: Library interposition with PMPI Functions: Compiler instrumentation Hardware Counters: Monitoring thread wakes up in regular intervals and reads performance counter from application thread Thomas Ilsche 31

27 Combining Performance Analysis Techniques (2) NPB FT class B 16 procs 1ms sampling interval Thomas Ilsche 33

28 Combining Performance Analysis Techniques (2) Thomas Ilsche 34

29 Combining Performance Analysis Techniques (2) Thomas Ilsche 35

30 Combining Performance Analysis Techniques (2) Thomas Ilsche 36

31 Combining Performance Analysis Techniques (2) Thomas Ilsche 37

32 Combining Performance Analysis Techniques (2) Normalized trace sizes of NPB CLASS B, sampled (1 ksa/s). Baseline: trace without counters. Filtered functions: matmul_sub, matvec_sub, binvcrhs, exact_solution. Thomas Ilsche 38

33 Combining Performance Analysis Techniques Event-based instrumentation for: MPI, SHMEM, CUDA Manual instrumentation of longer program phases Function instrumentation together with sophisticated filtering Sampling for: Call-stack of programs where filtering is not feasible (often C++) Hardware counters External metrics (e.g. power consumption) Thomas Ilsche 39

34 Conclusion Combine strength and mitigate weaknesses by selecting the right technique for different aspects of performance analysis Sampling and instrumentation complement each other Many tools already cross the borders of single technique Use a clear terminology for techniques Separate the description for data acquisition, data recording and data presentation Event-based instrumentation vs sampling Tracing vs profiling Thomas Ilsche 40

35 Outlook Sampling in Score-P 1.4 planned for 2014 will be experimental New trace records for samples Analyzing & visualizing merged call-stack samples and region instrumentation Sample is valid for point in time Instrumentation covers time ranges Overlap? Thomas Ilsche 41

Performance Analysis for GPU Accelerated Applications

Performance Analysis for GPU Accelerated Applications Center for Information Services and High Performance Computing (ZIH) Performance Analysis for GPU Accelerated Applications Working Together for more Insight Willersbau, Room A218 Tel. +49 351-463 - 39871

More information

Combining Instrumentation and Sampling for Trace-based Application Performance Analysis

Combining Instrumentation and Sampling for Trace-based Application Performance Analysis Combining Instrumentation and Sampling for Trace-based Application Performance Analysis Thomas Ilsche, Joseph Schuchart, Robert Schöne, and Daniel Hackenberg Abstract Performance analysis is vital for

More information

Unified Performance Data Collection with Score-P

Unified Performance Data Collection with Score-P Unified Performance Data Collection with Score-P Bert Wesarg 1) With contributions from Andreas Knüpfer 1), Christian Rössel 2), and Felix Wolf 3) 1) ZIH TU Dresden, 2) FZ Jülich, 3) GRS-SIM Aachen Fragmentation

More information

HPC Software Debugger and Performance Tools

HPC Software Debugger and Performance Tools Mitglied der Helmholtz-Gemeinschaft HPC Software Debugger and Performance Tools November 2015 Michael Knobloch Outline Local module setup Compilers* Libraries* Make it work, make it right, make it fast.

More information

Application Performance Analysis Tools and Techniques

Application Performance Analysis Tools and Techniques Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre c.roessel@fz-juelich.de EU-US HPC Summer School Dublin

More information

High Performance Computing in Aachen

High Performance Computing in Aachen High Performance Computing in Aachen Christian Iwainsky iwainsky@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Produktivitätstools unter Linux Sep 16, RWTH Aachen University

More information

High Performance Computing in Aachen

High Performance Computing in Aachen High Performance Computing in Aachen Christian Iwainsky iwainsky@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Produktivitätstools unter Linux Sep 16, RWTH Aachen University

More information

CRESTA DPI OpenMPI 1.1 - Performance Optimisation and Analysis

CRESTA DPI OpenMPI 1.1 - Performance Optimisation and Analysis Version Date Comments, Changes, Status Authors, contributors, reviewers 0.1 24/08/2012 First full version of the deliverable Jens Doleschal (TUD) 0.1 03/09/2012 Review Ben Hall (UCL) 0.1 13/09/2012 Review

More information

11.1 inspectit. 11.1. inspectit

11.1 inspectit. 11.1. inspectit 11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.

More information

Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop

More information

Scalable performance analysis of large-scale parallel applications

Scalable performance analysis of large-scale parallel applications Scalable performance analysis of large-scale parallel applications Brian Wylie & Markus Geimer Jülich Supercomputing Centre scalasca@fz-juelich.de April 2012 Performance analysis, tools & techniques Profile

More information

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ) Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications

More information

Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München

Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München SuperMUC: 3 Petaflops (3*10 15 =quadrillion), 3 MW 2 TOP 500 List TOTAL #1 #500

More information

Libmonitor: A Tool for First-Party Monitoring

Libmonitor: A Tool for First-Party Monitoring Libmonitor: A Tool for First-Party Monitoring Mark W. Krentel Dept. of Computer Science Rice University 6100 Main St., Houston, TX 77005 krentel@rice.edu ABSTRACT Libmonitor is a library that provides

More information

Understanding applications using the BSC performance tools

Understanding applications using the BSC performance tools Understanding applications using the BSC performance tools Judit Gimenez (judit@bsc.es) German Llort(german.llort@bsc.es) Humans are visual creatures Films or books? Two hours vs. days (months) Memorizing

More information

BG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility

BG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility BG/Q Performance Tools Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 BG/Q Performance Tool Development In conjuncgon with the Early Science program an Early SoMware efforts was inigated to

More information

Online Performance Observation of Large-Scale Parallel Applications

Online Performance Observation of Large-Scale Parallel Applications 1 Online Observation of Large-Scale Parallel Applications Allen D. Malony and Sameer Shende and Robert Bell {malony,sameer,bertie}@cs.uoregon.edu Department of Computer and Information Science University

More information

Profiling and Tracing in Linux

Profiling and Tracing in Linux Profiling and Tracing in Linux Sameer Shende Department of Computer and Information Science University of Oregon, Eugene, OR, USA sameer@cs.uoregon.edu Abstract Profiling and tracing tools can help make

More information

Score-P A Unified Performance Measurement System for Petascale Applications

Score-P A Unified Performance Measurement System for Petascale Applications Score-P A Unified Performance Measurement System for Petascale Applications Dieter an Mey(d), Scott Biersdorf(h), Christian Bischof(d), Kai Diethelm(c), Dominic Eschweiler(a), Michael Gerndt(g), Andreas

More information

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA Tools For Profiling And Monitoring. David Goodwin NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale

More information

Application Performance Analysis Tools and Techniques

Application Performance Analysis Tools and Techniques Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2011-07-15 Jülich Supercomputing Centre c.roessel@fz-juelich.de Performance analysis: an old problem The most constant

More information

Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation

Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation Adam Leko leko@hcs.ufl.edu Hans Sherburne sherburne@hcs.ufl.edu Hung-Hsun Su su@hcs.ufl.edu Bryan Golden golden@hcs.ufl.edu

More information

Getting Started with CodeXL

Getting Started with CodeXL AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic

More information

End-user Tools for Application Performance Analysis Using Hardware Counters

End-user Tools for Application Performance Analysis Using Hardware Counters 1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in

More information

A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville mucci@pdc.kth.se

A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville mucci@pdc.kth.se A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville mucci@pdc.kth.se Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB

Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer Dr. Johan Kraft, Percepio AB Debugging embedded software can be a challenging, time-consuming and unpredictable factor in development

More information

Monitoring, Tracing, Debugging (Under Construction)

Monitoring, Tracing, Debugging (Under Construction) Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.

More information

OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging

OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Nawal Copty, John DelSignore, Robert Dietrich,

More information

Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014

Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014 Uncovering degraded application performance with LWM 2 Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 214 Motivation: Performance degradation Internal factors: Inefficient use of hardware

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

How To Visualize Performance Data In A Computer Program

How To Visualize Performance Data In A Computer Program Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain

More information

Linux tools for debugging and profiling MPI codes

Linux tools for debugging and profiling MPI codes Competence in High Performance Computing Linux tools for debugging and profiling MPI codes Werner Krotz-Vogel, Pallas GmbH MRCCS September 02000 Pallas GmbH Hermülheimer Straße 10 D-50321

More information

BG/Q Performance Tools. Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 Argonne Leadership CompuCng Facility

BG/Q Performance Tools. Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 Argonne Leadership CompuCng Facility BG/Q Performance Tools Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 BG/Q Performance Tool Development In conjunccon with the Early Science program an Early SoIware efforts was inicated to bring

More information

Performance Tools for Parallel Java Environments

Performance Tools for Parallel Java Environments Performance Tools for Parallel Java Environments Sameer Shende and Allen D. Malony Department of Computer and Information Science, University of Oregon {sameer,malony}@cs.uoregon.edu http://www.cs.uoregon.edu/research/paracomp/tau

More information

Introduction to application performance analysis

Introduction to application performance analysis Introduction to application performance analysis Performance engineering We want to get the most science and engineering through a supercomputing system as possible. The more efficient codes are, the more

More information

User-level Power Monitoring and Application Performance on Cray XC30 Supercomputers

User-level Power Monitoring and Application Performance on Cray XC30 Supercomputers User-level Power Monitoring and Application Performance on Cray XC30 Supercomputers Alistair Hart, Harvey Richardson Cray Exascale Research Initiative Europe King s Buildings Edinburgh, UK {ahart,harveyr}@cray.com

More information

-------- Overview --------

-------- Overview -------- ------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------

More information

Zing Vision. Answering your toughest production Java performance questions

Zing Vision. Answering your toughest production Java performance questions Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A

More information

A Case Study - Scaling Legacy Code on Next Generation Platforms

A Case Study - Scaling Legacy Code on Next Generation Platforms Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy

More information

Experiences with Tools at NERSC

Experiences with Tools at NERSC Experiences with Tools at NERSC Richard Gerber NERSC User Services Programming weather, climate, and earth- system models on heterogeneous mul>- core pla?orms September 7, 2011 at the Na>onal Center for

More information

Introduction to the TAU Performance System

Introduction to the TAU Performance System Introduction to the TAU Performance System Leap to Petascale Workshop 2012 at Argonne National Laboratory, ALCF, Bldg. 240,# 1416, May 22-25, 2012, Argonne, IL Sameer Shende, U. Oregon sameer@cs.uoregon.edu

More information

Trace-Based and Sample-Based Profiling in Rational Application Developer

Trace-Based and Sample-Based Profiling in Rational Application Developer Trace-Based and Sample-Based Profiling in Rational Application Developer This document is aimed at highlighting the importance of profiling in software development and talks about the profiling tools offered

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information

Recent Advances in Periscope for Performance Analysis and Tuning

Recent Advances in Periscope for Performance Analysis and Tuning Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,

More information

How to Use Open SpeedShop BGP and Cray XT/XE

How to Use Open SpeedShop BGP and Cray XT/XE How to Use Open SpeedShop BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 1 Why Open SpeedShop? Open Source Performance Analysis Tool Framework Most common performance analysis steps

More information

Tools for Performance Debugging HPC Applications. David Skinner deskinner@lbl.gov

Tools for Performance Debugging HPC Applications. David Skinner deskinner@lbl.gov Tools for Performance Debugging HPC Applications David Skinner deskinner@lbl.gov Tools for Performance Debugging Practice Where to find tools Specifics to NERSC and Hopper Principles Topics in performance

More information

OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis

OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, John DelSignore, Robert Dietrich, Xu

More information

Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers

Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers axxls workshop @ ISC-HPC 2015, Jul 16, 2015 Akihiro Nomura Global Scientific Information and Computing Center

More information

GPU Profiling with AMD CodeXL

GPU Profiling with AMD CodeXL GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources

More information

.:!II PACKARD. Performance Evaluation ofa Distributed Application Performance Monitor

.:!II PACKARD. Performance Evaluation ofa Distributed Application Performance Monitor r~3 HEWLETT.:!II PACKARD Performance Evaluation ofa Distributed Application Performance Monitor Richard J. Friedrich, Jerome A. Rolia* Broadband Information Systems Laboratory HPL-95-137 December, 1995

More information

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

Data Structure Oriented Monitoring for OpenMP Programs

Data Structure Oriented Monitoring for OpenMP Programs A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,

More information

Graphic Chartiles and High Performance Computing

Graphic Chartiles and High Performance Computing Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen Data Presentation Nöthnitzer Straße 46 Raum 1026 Tel. +49 351-463 - 35048 Holger Brunst (holger.brunst@tu-dresden.de)

More information

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations

More information

IUmd. Performance Analysis of a Molecular Dynamics Code. Thomas William. Dresden, 5/27/13

IUmd. Performance Analysis of a Molecular Dynamics Code. Thomas William. Dresden, 5/27/13 Center for Information Services and High Performance Computing (ZIH) IUmd Performance Analysis of a Molecular Dynamics Code Thomas William Dresden, 5/27/13 Overview IUmd Introduction First Look with Vampir

More information

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment Wyatt Spear, Allen Malony, Alan Morris, Sameer Shende {wspear, malony, amorris, sameer}@cs.uoregon.edu

More information

Optimizing Linux Performance

Optimizing Linux Performance Optimizing Linux Performance Why is Performance Important Regular desktop user Not everyone has the latest hardware Waiting for an application to open Application not responding Memory errors Extra kernel

More information

Replication on Virtual Machines

Replication on Virtual Machines Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism

More information

Performance Tools for System Monitoring

Performance Tools for System Monitoring Center for Information Services and High Performance Computing (ZIH) 01069 Dresden Performance Tools for System Monitoring 1st CHANGES Workshop, Jülich Zellescher Weg 12 Tel. +49 351-463 35450 September

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt

PAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt 1 PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,

More information

OpenMP Tools API (OMPT) and HPCToolkit

OpenMP Tools API (OMPT) and HPCToolkit OpenMP Tools API (OMPT) and HPCToolkit John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu SC13 OpenMP Birds of a Feather Session, November 19, 2013 OpenMP Tools Subcommittee

More information

MPI / ClusterTools Update and Plans

MPI / ClusterTools Update and Plans HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski

More information

Analysis report examination with CUBE

Analysis report examination with CUBE Analysis report examination with CUBE Brian Wylie Jülich Supercomputing Centre CUBE Parallel program analysis report exploration tools Libraries for XML report reading & writing Algebra utilities for report

More information

Plug and Play Solution for AUTOSAR Software Components

Plug and Play Solution for AUTOSAR Software Components Plug and Play Solution for AUTOSAR Software Components The interfaces defined in the AUTOSAR standard enable an easier assembly of the ECU application out of components from different suppliers. However,

More information

Using the Intel Inspector XE

Using the Intel Inspector XE Using the Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory

More information

Chapter 13 Configuration Management

Chapter 13 Configuration Management Chapter 13 Configuration Management Using UML, Patterns, and Java Object-Oriented Software Engineering Outline of the Lecture Purpose of Software Configuration Management (SCM)! Motivation: Why software

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

AjaxScope: Remotely Monitoring Client-side Web-App Behavior

AjaxScope: Remotely Monitoring Client-side Web-App Behavior AjaxScope: Remotely Monitoring Client-side Web-App Behavior Emre Kıcıman emrek@microsoft.com Ben Livshits livshits@microsoft.com Internet Services Research Center Microsoft Research Runtime Analysis &

More information

Optimization tools. 1) Improving Overall I/O

Optimization tools. 1) Improving Overall I/O Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the

More information

Intrusion Detection via Static Analysis

Intrusion Detection via Static Analysis Intrusion Detection via Static Analysis IEEE Symposium on Security & Privacy 01 David Wagner Drew Dean Presented by Yongjian Hu Outline Introduction Motivation Models Trivial model Callgraph model Abstract

More information

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Run-time test configurations for load testing

Run-time test configurations for load testing Run-time test configurations for load testing Gábor Ziegler, Ericsson Hungary Ltd. Contents Introduction What is TITANSim Motivation for TITANSim Functional description of the parts of TITANSim CLL, Application

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

Performance Analysis of Computer Systems

Performance Analysis of Computer Systems Center for Information Services and High Performance Computing (ZIH) Performance Analysis of Computer Systems Monitoring Techniques Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de)

More information

Load Imbalance Analysis

Load Imbalance Analysis With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization

More information

Sequential Performance Analysis with Callgrind and KCachegrind

Sequential Performance Analysis with Callgrind and KCachegrind Sequential Performance Analysis with Callgrind and KCachegrind 4 th Parallel Tools Workshop, HLRS, Stuttgart, September 7/8, 2010 Josef Weidendorfer Lehrstuhl für Rechnertechnik und Rechnerorganisation

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

A Performance Monitoring Interface for OpenMP

A Performance Monitoring Interface for OpenMP A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany

More information

On the Importance of Thread Placement on Multicore Architectures

On the Importance of Thread Placement on Multicore Architectures On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...

More information

A Multi-layered Domain-specific Language for Stencil Computations

A Multi-layered Domain-specific Language for Stencil Computations A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,

More information

TCP Adaptation for MPI on Long-and-Fat Networks

TCP Adaptation for MPI on Long-and-Fat Networks TCP Adaptation for MPI on Long-and-Fat Networks Motohiko Matsuda, Tomohiro Kudoh Yuetsu Kodama, Ryousei Takano Grid Technology Research Center Yutaka Ishikawa The University of Tokyo Outline Background

More information

CSCI E 98: Managed Environments for the Execution of Programs

CSCI E 98: Managed Environments for the Execution of Programs CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office

More information

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1

More information

Search Strategies for Automatic Performance Analysis Tools

Search Strategies for Automatic Performance Analysis Tools Search Strategies for Automatic Performance Analysis Tools Michael Gerndt and Edmond Kereku Technische Universität München, Fakultät für Informatik I10, Boltzmannstr.3, 85748 Garching, Germany gerndt@in.tum.de

More information

Compiler-Assisted Binary Parsing

Compiler-Assisted Binary Parsing Compiler-Assisted Binary Parsing Tugrul Ince tugrul@cs.umd.edu PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance

More information

V 6.1 Core Training Training Plan

V 6.1 Core Training Training Plan V 6.1 Core Training Training Plan 2014 Version 1.0 Document Revision 1.0 2014 OpenSpan Incorporated. All rights reserved. OpenSpan and the Open Span logo are trademarks of OpenSpan, Incorporated. Other

More information

Object Instance Profiling

Object Instance Profiling Object Instance Profiling Lubomír Bulej 1,2, Lukáš Marek 1, Petr Tůma 1 Technical report No. 2009/7, November 2009 Version 1.0, November 2009 1 Distributed Systems Research Group, Department of Software

More information

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Ryan Yates 5-5-2014 1/21 Introduction Outline Thesis Why Haskell? Preliminary work Hybrid TM for GHC Obstacles to Performance

More information

Debugging with TotalView

Debugging with TotalView Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

More information

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015 Development With ARM DS-5 Mervyn Liu FAE Aug. 2015 1 Support for all Stages of Product Development Single IDE, compiler, debug, trace and performance analysis for all stages in the product development

More information

Keys to node-level performance analysis and threading in HPC applications

Keys to node-level performance analysis and threading in HPC applications Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION

More information

Chapter 13 Configuration Management

Chapter 13 Configuration Management Object-Oriented Software Engineering Using UML, Patterns, and Java Chapter 13 Configuration Management Outline of the Lecture Purpose of Software Configuration Management (SCM)! Motivation: Why software

More information

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability

More information

Analytics for Performance Optimization of BPMN2.0 Business Processes

Analytics for Performance Optimization of BPMN2.0 Business Processes Analytics for Performance Optimization of BPMN2.0 Business Processes Robert M. Shapiro, Global 360, USA Hartmann Genrich, GMD (retired), Germany INTRODUCTION We describe a new approach to process improvement

More information

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures Gabriele Jost *, Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 USA {gjost,hjin}@nas.nasa.gov

More information

Enterprise Manager Performance Tips

Enterprise Manager Performance Tips Enterprise Manager Performance Tips + The tips below are related to common situations customers experience when their Enterprise Manager(s) are not performing consistent with performance goals. If you

More information