Debugging with TotalView

Similar documents
GPU Tools Sandra Wienke

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Using the Intel Inspector XE

Parallel Debugging with DDT

The RWTH Compute Cluster Environment

Eliminate Memory Errors and Improve Program Stability

Using the Windows Cluster

Session 2: MUST. Correctness Checking

Improve Fortran Code Quality with Static Analysis

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

SLURM Workload Manager

Parallel Computing. Parallel shared memory computing with OpenMP

Memory Debugging with TotalView on AIX and Linux/Power

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Parallelization: Binary Tree Traversal

AMD CodeXL 1.7 GA Release Notes

Getting Started with CodeXL

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Andreas Burghart 6 October 2014 v1.0

Real-time Debugging using GDB Tracepoints and other Eclipse features

The Asterope compute cluster

EE8205: Embedded Computer System Electrical and Computer Engineering, Ryerson University. Multitasking ARM-Applications with uvision and RTX

C Programming Review & Productivity Tools

Hybrid Programming with MPI and OpenMP

Introduction to Hybrid Programming

Running applications on the Cray XC30 4/12/2015

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Redis OLTP (Transactional) Load Testing

RTOS Debugger for ecos

Debugging Export Connectors With Visual Studio.NET

MONITORING PERFORMANCE IN WINDOWS 7

Linux tools for debugging and profiling MPI codes

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Parallel Computing. Shared memory parallel programming with OpenMP

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

WebSphere Business Monitor

Lab 2-2: Exploring Threads

DS-5 ARM. Using the Debugger. Version Copyright ARM. All rights reserved. ARM DUI 0446M (ID120712)

NetBeans Profiler is an

#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:

Improve Fortran Code Quality with Static Security Analysis (SSA)

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

Allinea Forge User Guide. Version 6.0.1

Streamline Computing Linux Cluster User Training. ( Nottingham University)

MPI / ClusterTools Update and Plans

- An Essential Building Block for Stable and Reliable Compute Clusters

Quick Start Tutorial. Using the TASKING* Software Development Tools with the Intel 8x930 Family Evaluation Board

High Performance Computing in Aachen

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Department of Veterans Affairs. Open Source Electronic Health Record Services

RA MPI Compilers Debuggers Profiling. March 25, 2009

GPI Global Address Space Programming Interface

ELEC 377. Operating Systems. Week 1 Class 3

SANbox Manager Release Notes Version Rev A

XDB Intel System Debugger 2015 Overview Training. Robert Mueller-Albrecht, TCE, SSG DPD ECDL

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

PetaLinux SDK User Guide. Application Development Guide

Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v Steps to Developing a QNX Program Quickstart Guide

Parallel Programming Survey

End-user Tools for Application Performance Analysis Using Hardware Counters

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

Operating System Structure

Parallel Processing using the LOTUS cluster

Developing Parallel Applications with the Eclipse Parallel Tools Platform

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Allinea DDT and MAP User Guide. Version 4.2

Monitoring, Tracing, Debugging (Under Construction)

Xeon Phi Application Development on Windows OS

DS-5 ARM. Using the Debugger. Version Copyright ARM. All rights reserved. ARM DUI0446P

OpenACC Basics Directive-based GPGPU Programming

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Windows HPC 2008 Cluster Launch

Informationsaustausch für Nutzer des Aachener HPC Clusters

Multi-core Programming System Overview

Introduction to HPC Workshop. Center for e-research

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

Parallel Computing with Mathematica UVACSE Short Course

Network Licensing. White Paper 0-15Apr014ks(WP02_Network) Network Licensing with the CRYPTO-BOX. White Paper

STLinux Software development environment

Capacitive Touch Lab. Renesas Capacitive Touch Lab R8C/36T-A Family

LICENSE4J FLOATING LICENSE SERVER USER GUIDE

Installing and running COMSOL on a Linux cluster

Configuring and Launching ANSYS FLUENT Distributed using IBM Platform MPI or Intel MPI

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Chapter 3 Application Monitors

CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series

Integrating SNiFF+ with the Data Display Debugger (DDD)

Parallel and Distributed Computing Programming Assignment 1

Getting Started with Mplus Version 7.31 Demo for Mac OS X and Linux

How To Understand How A Process Works In Unix (Shell) (Shell Shell) (Program) (Unix) (For A Non-Program) And (Shell).Orgode) (Powerpoint) (Permanent) (Processes

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

HPC Wales Skills Academy Course Catalogue 2015

StarWind iscsi SAN: Configuring Global Deduplication May 2012

Transcription:

Tim Cramer 17.03.2015 IT Center der RWTH Aachen University

Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich your application with printf s OR Use an adequate tool a debugger. Debuggers will enhance your productivity. 2

What is TotalView? A comprehensive debugging solution for demanding parallel and multi-core applications Wide compiler & platform support C, C++, Fortran 77 & 90, UPC Linux, OS X, Unix Windows frontend (client) Handles concurrency multi-threaded debugging parallel debugging MPI, PVM, others remote and client/server debugging Integrated Memory Debugging Reverse Debugging available ReplayEngine Supports a Variety of Usage Models powerful and easy GUI visualization CLI for scripting Long distance remote debugging Unattended batch debugging / GUI-free debugging with TVScript 3

About this Session TotalView is a GUI-based Debugger I don t like slides with one screenshot after the other I want to do this as interactive as possible If you want: Start your Laptops and login Login to RWTH Compute Cluster using the X-Win32 All examples should be distributed in the hpclabxx accounts (directory: ${HOME}/totalviewlabs/livedemo) Slides for interactive sessions will be online with screenshots 4

Before you start Lean back and relax Check your environment increase ulimits (ulimit -a) -s (! OpenMP Stack size (crucial for Fortran and -t CPU time -v Address space and others -c ( files Core file size (crucial for debugging on core Remove all objects and intermediate files ( g -) Rebuild with debugging info ON ( O0 -) optimization OFF Problem still here? Use a debugger! 5

TotalView in the RWTH Environment Initialize the environment and startup: $ module load totalview $ totalview or load a binary directly (here called a.out): $ totalview a.out -a <options of a.out> Main modes: Start a new process Attach to a running process Load a core file (Post-Mortem) 6

Process Window Toolbar Process and Thread Status Stack Trace Pane Stack Frame Pane Source Pane Tabbed Pane 7

Root Window and Console Status Info B = Breakpoint E = Error W = Watchpoint R = Running M = Mixed T = Stopped 8

Break- and Watchpoints Breakpoints Interrupt execution when reaching a specific code line Conditional Breakpoints possible Set by clicking in the source pane Temporary disabling is possible Watchpoints Interrupt when a change occurs to a specific memory location Conditional watchpoints possible (e.g. only stop if the sign of the value changes or specified threshold reached) 9

Demo 1: TotalView Startup and Basics Live Demo Startup 10

L1: Login with X-Win32 & Startup TotalView 11

L1: Make & Run hpclab99@cluster:~/totalviewlabs/livedemo/l1_basics[32]$ ulimit c unlimited hpclab99@cluster:~/totalviewlabs/livedemo/l1_basics[32]$ make icc -O0 -g fix_me.c -o fix_me.exe hpclab99@cluster:~/totalviewlabs/livedemo/l1_basics[33]$./fix_me.exe Init a[0] with 0.000000 Init a[1] with 1.000000 [...] Init a[29179] with 29179.000000 Init a[29180] with 29180.000000 Init a[29181] with 29181.000000 zsh: segmentation fault (core dumped)./fix_me.exe hpclab99@cluster:~/totalviewlabs/livedemo/l1_basics[34] 12

The Source Code hpclab99@cluster:~/totalviewlabs/livedemo/l1_basics[48]$ cat fix_me.c #include <stdio.h> #include <stdlib.h> void init(double* a, const int count){ int i; } for (i=0; i<count; i++){ a[i] = (double)i; printf("init a[%d] with %f\n", i, a[i]); } int main(int argc, char **argv){ const int count = 100000; double* a; a = (double*)malloc(count); init(a, count); } return 0; 13

Start TotalView 14

L1: Post-Mortem Debugging (1/2) <= core file <= executable 15

L1: Post-Mortem Debugging (2/2) 16

L1: Running within TotalView 17

Setting a breakpoint 18

Inspecting an array in C/C++ Typecast necessary 19

Data Visualizations Helpful for big data arrays 20

Demo 2: TotalView Watchpoints Live Demo Watchpoints 21

L2: Watchpoints (1/2) Create a watchpoint for a[29] 22

L2: Watchpoints (2/2) Will interrupt as soon as a[29] changes 23

Parallel Parallel Debugging might be very hard Try to debug a serial version of the program first! Typical multithreading errors may not be found (e.g., race conditions) Some errors only occur with optimized code (uninitialized variables?) with many processes outside of debug sessions (different timing) Nevertheless, TV is better than no TV Stack frames for every process / thread in one GUI Switching between processes / threads (even for accelerators like GPGPUs) Variable inspection (and visualization) across all processes / threads Deadlock detection Visualization of the MPI message queue 24

OpenMP and TotalView (1/2) Overview of thread number Number of threads in root window some OpenMP implementations use an additional system thread Thread state and navigation Step into a parallel region is not possible -> Set a breakpoint Easy switching between the threads with T- / T+ 25

OpenMP and TotalView (2/2) Breakpoint properties and barriers Default behavior: Same breakpoint for all threads Change Properties Group Barriers Process Thread Similar to breakpoints Help to synchronize Stop all threads of all processes Stop all threads of one process (default) Stop only the current thread 26

MPI and TotalView Two startup methods for MPI jobs New launch: $ totalview mpi-a.out Set parameters in GUI Easy and intuitive No detaching or re-attaching possible Not available on all platforms Classic lauch: $ mpiexec tv -np 4 mpi-a.out Arguments depend on MPI vendor Attach / Attach to subset / Detache / Reattache possible 27

Demo 3: Parallel Debugging with MPI Live Demo Parallel Debugging 28

L3: MPI Debugging 29

L3: MPI Debugging Be patient 30

L3: MPI Debugging 31

L3: MPI Debugging Press the HALT button 32

L3: MPI Debugging Rank 0 is waiting in barrier 33

L3: MPI Debugging Rank 1 still waits for data 34

L3: MPI Debugging Message Queue Graph Rank 1, 2, 3 are waiting for data on tag 99 35

Debugging of Large MPI Jobs Limitations Each MPI process consumes a TotalView license token RWTH has only 50 licenses Try to debug a small example Debugging of subset of the whole job Attaching to subset possible File -> Preferences -> Parallel When a job goes parallel menu set: [x] on Ask what to do (instead of Attach to all ) Choose a subset of processors in Group -> Attach Subset 36

Message Queue Graph Tools Message Queue Graph Hangs & Deadlocks Pending Messages Communication Patterns Green: Pending Sends Blue: Pending Receives Red: Unexpected Messages Find deadlocks: Options Cycle Detection 37

Memory Debugging: MemoryScape Debugging for Memory leaks (Leak detection) Double free Invalid arrary bounds Memory reports Consumption Comparisons 38

Summary Using a debugger enhances productivity TotalView helps you! Serial programs Threaded (OpenMP) programs MPI programs Hybrid programs Intel Xeon Phi GPGPUs Online information http://www.roguewave.com/support/product-documentation/totalview.aspx#totalview http://www.itc.rwth-aachen.de/hpc/primer 39

The End Thank you for your attention! 40