Combining Instrumentation and Sampling for Trace-based Application Performance Analysis
|
|
- Polly Dorsey
- 7 years ago
- Views:
Transcription
1 Center for Information Services and High Performance Computing (ZIH) Combining Instrumentation and Sampling for Trace-based Application Performance Analysis 8th International Parallel Tools Workshop Stuttgart, Germany, October 2, 2014 Thomas Ilsche Joseph Schuchart Robert Schöne Daniel Hackenberg
2 Introduction Looking at the landscape of performance analysis tools Identify established techniques Provide a structured overview Highlight strengths and weaknesses Identify novel combinations Combine strengths Mitigate weaknesses Look beyond the traditional fields of tools Thomas Ilsche 4
3 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 5
4 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 6
5 Data Acquisition: Event-based Instrumentation time bar foo main bar Measurement environment Event-based instrumentation; also: direct instrumentation, event trigger, probebased measurement or simply instrumentation. Modification of the application execution in order to record and present certain intrinsic events of the application execution, e.g., function entry and exit events. Thomas Ilsche 7
6 Data Acquisition: Event-based Instrumentation Overhead & perturbation depends on function call rate Hard to predict in complex applications Can be influenced by filtering function calls Preferably statically, not during runtime Complete information Accurate function call counts Message properties (semantics of function call arguments) Analysis tools may rely on completeness Thomas Ilsche 8
7 Data Acquisition: Event-based Instrumentation Various instrumentation methods available Compiler instrumentation * Library wrapping ** Source code transformation * Manual instrumentation * Binary instrumentation * Requires recompilation & separate performance measurement binary ** Requires relinking for statically linked binaries Thomas Ilsche 9
8 Data Acquisition: Sampling time 200us 400us 600us 800us bar foo main bar Measurement environment Sampling; also: statistical sampling or (ambiguously) profiling. Periodic interruption of a running program and inspection of its state. Thomas Ilsche 10
9 Data Acquisition: Sampling Overhead & perturbation depends on sampling rate Can be predicted Can be controlled Stack unwinding introduce uncertainty Easy to use (for end users) No recompilation or relinking necessary No filtering necessary Thomas Ilsche 11
10 Data Acquisition: Sampling Incomplete information No accurate function call counts No specific message properties or other semantics of function arguments Measurement has statistical value More reliable for longer running experiments Trade-off between accuracy and perturbation via sampling rate Thomas Ilsche 12
11 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 13
12 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main Thomas Ilsche 14
13 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main Thomas Ilsche 15
14 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Sampling Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Thomas Ilsche 16
15 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Sampling time_ex[bar] += 200 time_ex[foo] += 200 time_ex[foo] += 200 time_ex[bar] += 200 Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Thomas Ilsche 17
16 Summarization vs Logging Defines how the recording during runtime is performed. Event-based Instrumentation Summarization count[main]++ count[foo]++ count[bar]++ time[bar]+=200 time[foo]+=600 count[foo]++ time[bar]+=200 time[main]+=1000 Sampling time_ex[bar] += 200 time_ex[foo] += 200 time_ex[foo] += 200 time_ex[bar] += 200 Loses information Logging 0000us: Enter main 0050us: Enter foo 0100us: Enter bar 0300us: Leave bar 0650us: Leave foo 0700us: Enter bar 0900us: Leave bar 1000us: Leave main 200us: main foo bar 400us: main foo 600us: main foo 800us: main bar Requires memory at runtime Thomas Ilsche 18
17 Classification of performance analysis techniques Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 19
18 Data Presentation Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name open offtime memccpy write Example profile (gprof) Can be generated by summarization, but also from logging Example timeline showing call-path and event annotations (Vampir) Needs logging during recording Thomas Ilsche 20
19 Classification of performance analysis techniques Profiling Tracing Data Presentation Profiles Timelines Data Recording Summarization Logging Data Acquisition Sampling Event-based Instrumentation Performance Analysis Layer Performance Analysis Technique Based on [10] Juckeland, G.: Trace-based Performance Analysis for Hardware Accelerators. Ph.D. thesis, TU Dresden (2012) Thomas Ilsche 21
20 Tools Profiling Scalasca TAU Tracing VampirTrace Event-based Instrumentation Sampling gprof Score-P perf HPCToolkit Extrae Example Concepts Allinea MAP Thomas Ilsche 25
21 Combining Performance Analysis Techniques (1) C++ Graph code INDDGO OpenMP, 4 Threads Uninstrumented :< 6 seconds Instrumented (profiling): 72 seconds 1100% overhead! A trace file would be ~3.8 GB with even more overhead Thomas Ilsche 26
22 Combining Performance Analysis Techniques (1) Estimated aggregate size of event trace: 3851MB Estimated requirements for largest trace buffer (max_buf): 3851MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 3860MB (hint: When tracing set SCOREP_TOTAL_MEMORY=3860MB to avoid intermediate flushes or reduce requirements using USR regions filters.) type max_buf[b] visits time[s] region ALL 4,038,048, ,849, ALL USR 4,038,047, ,849, USR OMP OMP COM COM 72 functions with > 1 million visits USR 365,389,440 14,053, Graph::lcgrand(int) USR 322,737,636 12,412, std::_list_iterator<int>::operator*() const USR 208,735,202 8,028, std::_list_iterator<int>::operator++() USR 201,389,266 7,745, std::_list_iterator<int>::_list_iterator USR 200,350,128 12,521, std::_list_iterator<int>::operator!= USR 1,040,000 40, Graph::Node* std:: addressof Thomas Ilsche 27
23 Combining Performance Analysis Techniques (1) MPI instrumentation For messages, complete information is very important during analysis MPI functions generally imply a certain minimum load Lower relative overhead expected compared to very short computation functions Call-path sampling For function execution statistical information may be sufficient Controlling the overhead of compiler instrumentation via filtering is not straightforward Tracing (Logging Timelines) Prototype in VampirTrace MPI: Traditional library interposition with PMPI Sampling: Performance counter based interrupt (e.g. every 1 million cycles) Thomas Ilsche 28
24 Combining Performance Analysis Techniques (1) Example with NPB-BT, 1 sample every 1 million cycles (~385 us) Thomas Ilsche 29
25 Combining Performance Analysis Techniques (1) MPI Instrumentation and call-path sampling Example with NPB-BT Runtime filter: matmul_sub, matvec_sub, binvrhs, binvcrhs, lhsinit, exact_solution Filter helps with overhead, but no more information about those functions ** Sampling rate of 2.6 ksa/s. Thomas Ilsche 30
26 Combining Performance Analysis Techniques (2) MPI and compiler instrumentation and performance counter sampling Tracing (Logging Timelines) Implemented as VampirTrace/Score-P metrics plugin MPI: Library interposition with PMPI Functions: Compiler instrumentation Hardware Counters: Monitoring thread wakes up in regular intervals and reads performance counter from application thread Thomas Ilsche 31
27 Combining Performance Analysis Techniques (2) NPB FT class B 16 procs 1ms sampling interval Thomas Ilsche 33
28 Combining Performance Analysis Techniques (2) Thomas Ilsche 34
29 Combining Performance Analysis Techniques (2) Thomas Ilsche 35
30 Combining Performance Analysis Techniques (2) Thomas Ilsche 36
31 Combining Performance Analysis Techniques (2) Thomas Ilsche 37
32 Combining Performance Analysis Techniques (2) Normalized trace sizes of NPB CLASS B, sampled (1 ksa/s). Baseline: trace without counters. Filtered functions: matmul_sub, matvec_sub, binvcrhs, exact_solution. Thomas Ilsche 38
33 Combining Performance Analysis Techniques Event-based instrumentation for: MPI, SHMEM, CUDA Manual instrumentation of longer program phases Function instrumentation together with sophisticated filtering Sampling for: Call-stack of programs where filtering is not feasible (often C++) Hardware counters External metrics (e.g. power consumption) Thomas Ilsche 39
34 Conclusion Combine strength and mitigate weaknesses by selecting the right technique for different aspects of performance analysis Sampling and instrumentation complement each other Many tools already cross the borders of single technique Use a clear terminology for techniques Separate the description for data acquisition, data recording and data presentation Event-based instrumentation vs sampling Tracing vs profiling Thomas Ilsche 40
35 Outlook Sampling in Score-P 1.4 planned for 2014 will be experimental New trace records for samples Analyzing & visualizing merged call-stack samples and region instrumentation Sample is valid for point in time Instrumentation covers time ranges Overlap? Thomas Ilsche 41
Performance Analysis for GPU Accelerated Applications
Center for Information Services and High Performance Computing (ZIH) Performance Analysis for GPU Accelerated Applications Working Together for more Insight Willersbau, Room A218 Tel. +49 351-463 - 39871
More informationCombining Instrumentation and Sampling for Trace-based Application Performance Analysis
Combining Instrumentation and Sampling for Trace-based Application Performance Analysis Thomas Ilsche, Joseph Schuchart, Robert Schöne, and Daniel Hackenberg Abstract Performance analysis is vital for
More informationUnified Performance Data Collection with Score-P
Unified Performance Data Collection with Score-P Bert Wesarg 1) With contributions from Andreas Knüpfer 1), Christian Rössel 2), and Felix Wolf 3) 1) ZIH TU Dresden, 2) FZ Jülich, 3) GRS-SIM Aachen Fragmentation
More informationHPC Software Debugger and Performance Tools
Mitglied der Helmholtz-Gemeinschaft HPC Software Debugger and Performance Tools November 2015 Michael Knobloch Outline Local module setup Compilers* Libraries* Make it work, make it right, make it fast.
More informationApplication Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre c.roessel@fz-juelich.de EU-US HPC Summer School Dublin
More informationHigh Performance Computing in Aachen
High Performance Computing in Aachen Christian Iwainsky iwainsky@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Produktivitätstools unter Linux Sep 16, RWTH Aachen University
More informationHigh Performance Computing in Aachen
High Performance Computing in Aachen Christian Iwainsky iwainsky@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Produktivitätstools unter Linux Sep 16, RWTH Aachen University
More informationCRESTA DPI OpenMPI 1.1 - Performance Optimisation and Analysis
Version Date Comments, Changes, Status Authors, contributors, reviewers 0.1 24/08/2012 First full version of the deliverable Jens Doleschal (TUD) 0.1 03/09/2012 Review Ben Hall (UCL) 0.1 13/09/2012 Review
More information11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More informationScalable performance analysis of large-scale parallel applications
Scalable performance analysis of large-scale parallel applications Brian Wylie & Markus Geimer Jülich Supercomputing Centre scalasca@fz-juelich.de April 2012 Performance analysis, tools & techniques Profile
More informationAdvanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
More informationAutomatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München
Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München SuperMUC: 3 Petaflops (3*10 15 =quadrillion), 3 MW 2 TOP 500 List TOTAL #1 #500
More informationLibmonitor: A Tool for First-Party Monitoring
Libmonitor: A Tool for First-Party Monitoring Mark W. Krentel Dept. of Computer Science Rice University 6100 Main St., Houston, TX 77005 krentel@rice.edu ABSTRACT Libmonitor is a library that provides
More informationUnderstanding applications using the BSC performance tools
Understanding applications using the BSC performance tools Judit Gimenez (judit@bsc.es) German Llort(german.llort@bsc.es) Humans are visual creatures Films or books? Two hours vs. days (months) Memorizing
More informationBG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility
BG/Q Performance Tools Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 BG/Q Performance Tool Development In conjuncgon with the Early Science program an Early SoMware efforts was inigated to
More informationOnline Performance Observation of Large-Scale Parallel Applications
1 Online Observation of Large-Scale Parallel Applications Allen D. Malony and Sameer Shende and Robert Bell {malony,sameer,bertie}@cs.uoregon.edu Department of Computer and Information Science University
More informationProfiling and Tracing in Linux
Profiling and Tracing in Linux Sameer Shende Department of Computer and Information Science University of Oregon, Eugene, OR, USA sameer@cs.uoregon.edu Abstract Profiling and tracing tools can help make
More informationScore-P A Unified Performance Measurement System for Petascale Applications
Score-P A Unified Performance Measurement System for Petascale Applications Dieter an Mey(d), Scott Biersdorf(h), Christian Bischof(d), Kai Diethelm(c), Dominic Eschweiler(a), Michael Gerndt(g), Andreas
More informationNVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
More informationApplication Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2011-07-15 Jülich Supercomputing Centre c.roessel@fz-juelich.de Performance analysis: an old problem The most constant
More informationPractical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation
Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation Adam Leko leko@hcs.ufl.edu Hans Sherburne sherburne@hcs.ufl.edu Hung-Hsun Su su@hcs.ufl.edu Bryan Golden golden@hcs.ufl.edu
More informationGetting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
More informationEnd-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
More informationA Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville mucci@pdc.kth.se
A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville mucci@pdc.kth.se Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure
More informationMAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
More informationSoftware Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB
Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer Dr. Johan Kraft, Percepio AB Debugging embedded software can be a challenging, time-consuming and unpredictable factor in development
More informationMonitoring, Tracing, Debugging (Under Construction)
Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.
More informationOMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging
OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Nawal Copty, John DelSignore, Robert Dietrich,
More informationUncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014
Uncovering degraded application performance with LWM 2 Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 214 Motivation: Performance degradation Internal factors: Inefficient use of hardware
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationHow To Visualize Performance Data In A Computer Program
Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain
More informationLinux tools for debugging and profiling MPI codes
Competence in High Performance Computing Linux tools for debugging and profiling MPI codes Werner Krotz-Vogel, Pallas GmbH MRCCS September 02000 Pallas GmbH Hermülheimer Straße 10 D-50321
More informationBG/Q Performance Tools. Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 Argonne Leadership CompuCng Facility
BG/Q Performance Tools Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 BG/Q Performance Tool Development In conjunccon with the Early Science program an Early SoIware efforts was inicated to bring
More informationPerformance Tools for Parallel Java Environments
Performance Tools for Parallel Java Environments Sameer Shende and Allen D. Malony Department of Computer and Information Science, University of Oregon {sameer,malony}@cs.uoregon.edu http://www.cs.uoregon.edu/research/paracomp/tau
More informationIntroduction to application performance analysis
Introduction to application performance analysis Performance engineering We want to get the most science and engineering through a supercomputing system as possible. The more efficient codes are, the more
More informationUser-level Power Monitoring and Application Performance on Cray XC30 Supercomputers
User-level Power Monitoring and Application Performance on Cray XC30 Supercomputers Alistair Hart, Harvey Richardson Cray Exascale Research Initiative Europe King s Buildings Edinburgh, UK {ahart,harveyr}@cray.com
More information-------- Overview --------
------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------
More informationZing Vision. Answering your toughest production Java performance questions
Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A
More informationA Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
More informationExperiences with Tools at NERSC
Experiences with Tools at NERSC Richard Gerber NERSC User Services Programming weather, climate, and earth- system models on heterogeneous mul>- core pla?orms September 7, 2011 at the Na>onal Center for
More informationIntroduction to the TAU Performance System
Introduction to the TAU Performance System Leap to Petascale Workshop 2012 at Argonne National Laboratory, ALCF, Bldg. 240,# 1416, May 22-25, 2012, Argonne, IL Sameer Shende, U. Oregon sameer@cs.uoregon.edu
More informationTrace-Based and Sample-Based Profiling in Rational Application Developer
Trace-Based and Sample-Based Profiling in Rational Application Developer This document is aimed at highlighting the importance of profiling in software development and talks about the profiling tools offered
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationBuilding an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
More informationRecent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
More informationHow to Use Open SpeedShop BGP and Cray XT/XE
How to Use Open SpeedShop BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 1 Why Open SpeedShop? Open Source Performance Analysis Tool Framework Most common performance analysis steps
More informationTools for Performance Debugging HPC Applications. David Skinner deskinner@lbl.gov
Tools for Performance Debugging HPC Applications David Skinner deskinner@lbl.gov Tools for Performance Debugging Practice Where to find tools Specifics to NERSC and Hopper Principles Topics in performance
More informationOMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis
OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, John DelSignore, Robert Dietrich, Xu
More informationPerformance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers
Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers axxls workshop @ ISC-HPC 2015, Jul 16, 2015 Akihiro Nomura Global Scientific Information and Computing Center
More informationGPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
More information.:!II PACKARD. Performance Evaluation ofa Distributed Application Performance Monitor
r~3 HEWLETT.:!II PACKARD Performance Evaluation ofa Distributed Application Performance Monitor Richard J. Friedrich, Jerome A. Rolia* Broadband Information Systems Laboratory HPL-95-137 December, 1995
More informationBasics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
More informationOpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
More informationData Structure Oriented Monitoring for OpenMP Programs
A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,
More informationGraphic Chartiles and High Performance Computing
Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen Data Presentation Nöthnitzer Straße 46 Raum 1026 Tel. +49 351-463 - 35048 Holger Brunst (holger.brunst@tu-dresden.de)
More informationHow To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power
A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations
More informationIUmd. Performance Analysis of a Molecular Dynamics Code. Thomas William. Dresden, 5/27/13
Center for Information Services and High Performance Computing (ZIH) IUmd Performance Analysis of a Molecular Dynamics Code Thomas William Dresden, 5/27/13 Overview IUmd Introduction First Look with Vampir
More informationIntegrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment
Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment Wyatt Spear, Allen Malony, Alan Morris, Sameer Shende {wspear, malony, amorris, sameer}@cs.uoregon.edu
More informationOptimizing Linux Performance
Optimizing Linux Performance Why is Performance Important Regular desktop user Not everyone has the latest hardware Waiting for an application to open Application not responding Memory errors Extra kernel
More informationReplication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
More informationPerformance Tools for System Monitoring
Center for Information Services and High Performance Computing (ZIH) 01069 Dresden Performance Tools for System Monitoring 1st CHANGES Workshop, Jülich Zellescher Weg 12 Tel. +49 351-463 35450 September
More informationPAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt
1 PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,
More informationOpenMP Tools API (OMPT) and HPCToolkit
OpenMP Tools API (OMPT) and HPCToolkit John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu SC13 OpenMP Birds of a Feather Session, November 19, 2013 OpenMP Tools Subcommittee
More informationMPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
More informationAnalysis report examination with CUBE
Analysis report examination with CUBE Brian Wylie Jülich Supercomputing Centre CUBE Parallel program analysis report exploration tools Libraries for XML report reading & writing Algebra utilities for report
More informationPlug and Play Solution for AUTOSAR Software Components
Plug and Play Solution for AUTOSAR Software Components The interfaces defined in the AUTOSAR standard enable an easier assembly of the ECU application out of components from different suppliers. However,
More informationUsing the Intel Inspector XE
Using the Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory
More informationChapter 13 Configuration Management
Chapter 13 Configuration Management Using UML, Patterns, and Java Object-Oriented Software Engineering Outline of the Lecture Purpose of Software Configuration Management (SCM)! Motivation: Why software
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationAjaxScope: Remotely Monitoring Client-side Web-App Behavior
AjaxScope: Remotely Monitoring Client-side Web-App Behavior Emre Kıcıman emrek@microsoft.com Ben Livshits livshits@microsoft.com Internet Services Research Center Microsoft Research Runtime Analysis &
More informationOptimization tools. 1) Improving Overall I/O
Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the
More informationIntrusion Detection via Static Analysis
Intrusion Detection via Static Analysis IEEE Symposium on Security & Privacy 01 David Wagner Drew Dean Presented by Yongjian Hu Outline Introduction Motivation Models Trivial model Callgraph model Abstract
More informationKashif Iqbal - PhD Kashif.iqbal@ichec.ie
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationRun-time test configurations for load testing
Run-time test configurations for load testing Gábor Ziegler, Ericsson Hungary Ltd. Contents Introduction What is TITANSim Motivation for TITANSim Functional description of the parts of TITANSim CLL, Application
More informationOptimizing Application Performance with CUDA Profiling Tools
Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory
More informationPerformance Analysis of Computer Systems
Center for Information Services and High Performance Computing (ZIH) Performance Analysis of Computer Systems Monitoring Techniques Holger Brunst (holger.brunst@tu-dresden.de) Matthias S. Mueller (matthias.mueller@tu-dresden.de)
More informationLoad Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
More informationSequential Performance Analysis with Callgrind and KCachegrind
Sequential Performance Analysis with Callgrind and KCachegrind 4 th Parallel Tools Workshop, HLRS, Stuttgart, September 7/8, 2010 Josef Weidendorfer Lehrstuhl für Rechnertechnik und Rechnerorganisation
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationA Performance Monitoring Interface for OpenMP
A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany
More informationOn the Importance of Thread Placement on Multicore Architectures
On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...
More informationA Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
More informationTCP Adaptation for MPI on Long-and-Fat Networks
TCP Adaptation for MPI on Long-and-Fat Networks Motohiko Matsuda, Tomohiro Kudoh Yuetsu Kodama, Ryousei Takano Grid Technology Research Center Yutaka Ishikawa The University of Tokyo Outline Background
More informationCSCI E 98: Managed Environments for the Execution of Programs
CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office
More informationCompute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
More informationSearch Strategies for Automatic Performance Analysis Tools
Search Strategies for Automatic Performance Analysis Tools Michael Gerndt and Edmond Kereku Technische Universität München, Fakultät für Informatik I10, Boltzmannstr.3, 85748 Garching, Germany gerndt@in.tum.de
More informationCompiler-Assisted Binary Parsing
Compiler-Assisted Binary Parsing Tugrul Ince tugrul@cs.umd.edu PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance
More informationV 6.1 Core Training Training Plan
V 6.1 Core Training Training Plan 2014 Version 1.0 Document Revision 1.0 2014 OpenSpan Incorporated. All rights reserved. OpenSpan and the Open Span logo are trademarks of OpenSpan, Incorporated. Other
More informationObject Instance Profiling
Object Instance Profiling Lubomír Bulej 1,2, Lukáš Marek 1, Petr Tůma 1 Technical report No. 2009/7, November 2009 Version 1.0, November 2009 1 Distributed Systems Research Group, Department of Software
More informationThesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell
Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Ryan Yates 5-5-2014 1/21 Introduction Outline Thesis Why Haskell? Preliminary work Hybrid TM for GHC Obstacles to Performance
More informationDebugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
More informationDevelopment With ARM DS-5. Mervyn Liu FAE Aug. 2015
Development With ARM DS-5 Mervyn Liu FAE Aug. 2015 1 Support for all Stages of Product Development Single IDE, compiler, debug, trace and performance analysis for all stages in the product development
More informationKeys to node-level performance analysis and threading in HPC applications
Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
More informationChapter 13 Configuration Management
Object-Oriented Software Engineering Using UML, Patterns, and Java Chapter 13 Configuration Management Outline of the Lecture Purpose of Software Configuration Management (SCM)! Motivation: Why software
More informationEqualizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
More informationAnalytics for Performance Optimization of BPMN2.0 Business Processes
Analytics for Performance Optimization of BPMN2.0 Business Processes Robert M. Shapiro, Global 360, USA Hartmann Genrich, GMD (retired), Germany INTRODUCTION We describe a new approach to process improvement
More informationPerformance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures Gabriele Jost *, Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 USA {gjost,hjin}@nas.nasa.gov
More informationEnterprise Manager Performance Tips
Enterprise Manager Performance Tips + The tips below are related to common situations customers experience when their Enterprise Manager(s) are not performing consistent with performance goals. If you
More information