Memory Debugging with TotalView on AIX and Linux/Power



Similar documents
Debugging with TotalView

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Jonathan Worthington Scarborough Linux User Group

Eliminate Memory Errors and Improve Program Stability

Oracle Solaris Studio Code Analyzer

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

How To Visualize Performance Data In A Computer Program

Pattern Insight Clone Detection

- An Essential Building Block for Stable and Reliable Compute Clusters

Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney

What s Cool in the SAP JVM (CON3243)

Developing Parallel Applications with the Eclipse Parallel Tools Platform

HPC Wales Skills Academy Course Catalogue 2015

Monitoring, Tracing, Debugging (Under Construction)

Get the Better of Memory Leaks with Valgrind Whitepaper

GPU Tools Sandra Wienke

Using the Intel Inspector XE

Parallel Debugging with DDT

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Java Troubleshooting and Performance

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

A Practical Method to Diagnose Memory Leaks in Java Application Alan Yu

Performance Analysis and Optimization Tool

Instrumentation Software Profiling

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

The V8 JavaScript Engine

End-user Tools for Application Performance Analysis Using Hardware Counters

Leak Check Version 2.1 for Linux TM

Performance Tools for Parallel Java Environments

MPI / ClusterTools Update and Plans

Peach Fuzzer Platform

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Xcode Project Management Guide. (Legacy)

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Purify User s Guide. Version 4.1 support@rational.com

Building Applications Using Micro Focus COBOL

Session 2: MUST. Correctness Checking

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

VERITAS Cluster Server v2.0 Technical Overview

Secure Software Programming and Vulnerability Analysis

What Is Specific in Load Testing?

IBM Tivoli Composite Application Manager for WebSphere

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

Linux tools for debugging and profiling MPI codes

Perfmon2: A leap forward in Performance Monitoring

Cloud Computing. Up until now

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

CSCI E 98: Managed Environments for the Execution of Programs

Minimizing code defects to improve software quality and lower development costs.

MPI and Hybrid Programming Models. William Gropp

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

IBM LoadLeveler for Linux delivers job scheduling for IBM pseries and IBM xseries platforms running Linux

Trace-Based and Sample-Based Profiling in Rational Application Developer

Selection Criteria for ZigBee Development Kits

<Insert Picture Here> What's New in NetBeans IDE 7.2

Oracle JRockit Mission Control Overview

Complete Integrated Development Platform Copyright Atmel Corporation

Monitoring and Managing a JVM

Basic Unix/Linux 1. Software Testing Interview Prep

MID-TIER DEPLOYMENT KB

Load Balancing MPI Algorithm for High Throughput Applications

Testing for Security

Braindumps.C questions

a division of Technical Overview Xenos Enterprise Server 2.0

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Xeon Phi Application Development on Windows OS

The Asynchronous Dynamic Load-Balancing Library

Streamline Computing Linux Cluster User Training. ( Nottingham University)

JOURNAL OF OBJECT TECHNOLOGY

Using the Windows Cluster

DATABASE MANAGEMENT SYSTEM

Enhanced Diagnostics Improve Performance, Configurability, and Usability

Introduction to Embedded Systems. Software Update Problem

Troubleshooting.NET Applications - Knowing Which Tools to Use and When

COMMONWEALTH OF PENNSYLVANIA DEPARTMENT S OF PUBLIC WELFARE, INSURANCE, AND AGING

Java Management Extensions (JMX) and IBM FileNet System Monitor

Allinea Forge User Guide. Version 6.0.1

A standards-based approach to application integration

Application Centric Infrastructure Object-Oriented Data Model: Gain Advanced Network Control and Programmability

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

WebSphere Server Administration Course

Online Backup Client User Manual

Using Symantec NetBackup with Symantec Security Information Manager 4.5

Course Description. Course Audience. Course Outline. Course Page - Page 1 of 5

SCADE System Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System

Improve Fortran Code Quality with Static Analysis

Integrating SNiFF+ with the Data Display Debugger (DDD)

2015 ej-technologies GmbH. All rights reserved. JProfiler Manual

IBM WebSphere Server Administration

Optimization tools. 1) Improving Overall I/O

Scalability and Classifications

SAM XFile. Trial Installation Guide Linux. Snell OD is in the process of being rebranded SAM XFile

Readme File for All Platforms

PERFORMANCE TUNING FOR PEOPLESOFT APPLICATIONS

Transcription:

S cico m P Austin Aug 2004 Memory Debugging with TotalView on AIX and Linux/Power Chris Gottbrath

Memory Debugging in AIX and Linux-Power Clusters Intro: Define the problem and terms What are Memory bugs? Why are they hard to solve? Tools: TotalView and the Heap Interposition Agent What does it mean to be a Parallel Debugger? What can the HIA do? What is the TV Roadmap for Memory Debugging? Strategies: HIA Usage and Tips General Strategies Filling up memory Rank process crashing Example: Plugging a leak Conclusion 2

Intro: Memory Four kinds of memory Text Memory used to store your program's machine code instructions Data Memory used for storing uninitialized and initialized data Heap Memory used for data allocated at runtime This is the kind of memory that requires the most intensive management and is the focus of the rest of this talk Stack Memory used by the currently executing routine and all the routines in its backtrace 3

Intro: Heap Memory Heap is managed by the program C: Malloc() and free() C++: New and Delete Fortran90: Allocatable arrays Malloc usage is somthing like: in t * vp ; vp = m a lloc(s iz eof(in t )*n u m b er ); if (vp = = 0){ / *m a lloc m u s t h a ve fa iled */ } / * u s e vp */ fr ee(vp ); vp = 0; 4

Intro: What is a Memory Bug? A Memory Bug is a mistake in the management of heap memory Mistake: The program fails to follow the procedure definied in the heap allocation API Failure to check for error conditions Relying on nonstandard behavior Leaking: Failing to free memory Dangling references: Failing to clear pointers Fallout: The program may then operate on an address in the heap based on an incorrect assumption about the allocation state of that address Write/Read to a pointer pointing to a deallocated block Read/Write to a pointer pointing to a block that has been deallocated and then reallocated (for a new purpose) Leaked memory consumes a limited resource 5

Intro: Why are they hard? Memory problems can lurk For a given scale, or platform or problem they may be non-fatal Libraries could be source of problem The mistake and fallout can be widely separated The mistake is rarely fatal in and of itself The fallout can occur at any subsequent memory access through a pointer Potentially 'racy' Memory allocation pattern non-local Even the fallout is not always fatal. It can result in data corruption which may or may not result in a subsequent crash May be caused by or cause of a 'classical' bug 6

Intro: Memory Problem in Clusters Moving an application to a cluster increases the problem complexity Distributed algorithms are more complex Application data set size may push available memory even when everything is functioning correctly Porting to cluster may involve moving to a new architecture/os The Cluster Environement is different Many potentially useful memory tools aren't designed for use in a cluster May simply fail May require extreme 'workarounds' Report based tools need cluster-aware filtering mechanisms 7

Intro: What is the solution? Interactive debugging style Integrate memory debugging with general debugging practices Tackle parallel memory problems in clusters with The Right Tools -- used together Parallel Debugger Memory Debugger Experience to use tools effectively The remainder of this talk covers TotalView parallel and memory features Strategies for successful debugging An example debugging session 8

Tools: What is TotalView? Source Code Debugger C, C++, Fortran, Fortran90 Wide compiler and platform support Multithreaded debugging Heap Interposition Agent Powerful and Easy GUI Cluster architecture Memory Debugging Capabilities Including OpenMP Distributed Debugging Complex language features Visualization Extensible via Scripting 9

Tools: TotalView as Parallel Debugger Cluster Architecture Process Aquisition Usability Status Process Control Data Exploration MPI Message Queue Debugging Scalability 10

Tools: Architecture for Cluster Debugging Cluster Architecture Single Client (TotalView) Debugger Servers (tvdsvr) Heavy overhead GUI and debug engine Low overhead 1 per node Traces multiple rank processes Runs as user TotalView communicates directly with tvdsvrs Com p u t e Nod es Tot alview s t ar t s a s et of ligh t wegh t d eb u gger s er ver s Not using MPI Protocol optimization Provides: Robust, Scalable, Minimal Interaction 11

Tools: Process Aquisition TotalView Process Aquisition Seamlessly attach to all the processes making up an MPI job Based on a public interface No special support needed Drop in ssh as a secure replacement Bulk Server Launch Almost every MPI implementation provides support Single Server Launch based on rsh Jobs started via TotalView Already running or hung job Allows for faster launch if underlying support exists in the cluster environment (e.g. IBM POE) Optionally attach to a subset 12

Tools: Core Parallel Functionality The crucial thing in clusters is parallelism Parallelism touches the whole debugger interface More states than just started and stopped TotalView provides Automatic & manual process groups for process control Root & Process window Status information Navgation Rich set of action points Parallel expression evaluation machanism View SIMD data across all processes from one window Asyncronous CLI 13

Tools: MPI Message Queue Information Deadlocks MPI programs can suffer deadlocks TotalView can expose that information State information held in MPI library Quickly debug deadlocks Public interface that many MPI vendors support Message Queue graph Patterns easy to spot Detail windows 14

Tools: Scalability Scalability means many things Startup and runtime performance / responsiveness Memory usage Status and data representation Control Issues Program size/complexity also grows Practical scalability 10s of processes trivially 100s of processes regularly 1,000s of processes can be debugged currently with TotalView More work on scalability as part of BG/L work Features and strategies to work at scale Subset attach 15

Tools: TotalView as Memory Debugger Parallel Memory Usage Statistics Heap Tracker Heap Interposition Technique Capabilities Protocol Violations Flagged at Runtime Leak Detection Dangling Pointer Annotation Memory Painting Event Notification Memory Hording Parallel and MPI Aware Interface 16

Tools: Memory Statistics Memory Usage Statistics Gives overview of memory usage patterns By process or library Sortable Filterable 17

Tools: Memory Tracker The TotalView Memory Tracker Gets inserted into your program to provide instrumentation needed by TotalView It maintains separate table of allocations that can be read by TotalView Can take action at all points of allocation, re- and de-allocation Interposed over malloc() calls Linked 'between' your program and malloc() Catches malloc() calls and return values in both your program and libraries For parallel programs simple relinking Can be used without relinking in many serial cases Checks values and builds table of allocations If you have a custom malloc() you can continue to use it 18

Tools: Heap Errors Flagged by Tracker Example Heap allocation errors automatically detected Free not allocated Realloc not allocated free() or realloc() receive a heap address that does not lie at the start of any previously allocated block Double allocation call to realloc() with an address that does not lie at any allocated block Address not at start of block call to free() with an address that does not lie in any allocated block from the heap An already allocated address is returned by a new request. Indicates a problem in the heap manager. Allocation request returns NULL A null value is returned by an allocation operation Example Heap allocation errors not automatically detected Failure to call free() No call site for error 19

Tools: Heap Information Shows all memory allocations in each process By source code location By stack backtrace Select processes Drill down by source structure For each block Stack and source code at point of allocation If leak detection has been done leaks are highlighted 20

Tools: Leak detection Leak : unreachable memory Garbage Collection algorithm Examine all the pointers and registers in a program Any memory allocations not reachable by any pointers is a leak This is an expensive operation, initiated at user request List of leaks is displayed just as the heap entries False positives are possible 21

Tools: Dangling Pointer Detection Dangling Pointer: pointer to unallocated memory TotalView annotates dangling pointers in the variable window when HIA is activated May contain dangerously 'reasonable' looking data Similarly, pointers are annotated Allocated and Allocated Interior 22

Tools: Memory Painting The Heap Tracker can paint heap memory Allocated memory is normally returned with 'noise' Deallocated memory remains in the heap with old data intact It will be marked dangling in TV but the program might still mistakenly operate on the data Painting changes the data on allocation or deallocation In some cases this noise looks like program data and can be hard to spot Easy to spot visually Painted values point to invalid addresses Painted values can be chosen to raise arithmetic errors Change a subtle error into an obvious one 23

Tools: Event Notification and Hording Notification of allocation events Request notification of heap allocation events related to a specific allocation Allows a focused view of life cycle of a specific allocation Conceptually similar to a watchpoint/breakpoint Hoarding memory Prevents a certain bit of memory from being reallocated when it otherwise would Preserves information about the allocation Only function that changes allocation pattern 24

Tools: Using the Heap Tracker with AIX On AIX the HIA needs to be built against the system's C library AIX doesn't support pre-loading The script aix_install_tvheap_mr in the TotalView installation makes this easy. This needs to be run for each node in a cluster (use poe) This needs to be rerun if the system library changes Then your application needs to be linked with the HIA library For a 64 bit executable on AIX 5.X it is mpcc_r -g $target.o -o $target -L $path_mr -L $path \ $path/aix_malloctype64_5.o Then enable heap debugging in the TV GUI Turn on notification only for the poe task Use the CLI and enter dheap -notify There are other procedures, see the TV documentation. On Linux TotalView can use LD_PRELOAD interpose the HIA and relinking the executable is optional. 25

Tools: TotalView Roadmap TV 6.5.0 available now Available on AIX, x86, etc.. Parallel and Memory Debugging Features Power-Linux Release Coming Soon Planned for a release later this year (4Q2004) Support basic debugging features Not Memory Debugging (initially) No visualizer Memory Debugging enhancements in 2005 Added Power-Linux support Enhancements to Memory Debugging for all platforms Graphical view of heap allocations Separately stored configuration files Filters for memory debugging info Pointer Allocation Information Enhancements Heap API 26

Tools: Graphical View of Heap (future) This will provide a visual representation of the Heap Overall heap usage and fragmentation visible at a glance Leaked allocations would be marked Image could be zoomed Individual allocations could be selected as in the tree based report Allocations matching some critera could be highlighted The image to the right is a mock up The visual layout could change significantly 27

Tools: Memory Debugging Filters (future) Allows the user to remove heap blocks matching certain critera from the heap status and leak report. Remove entries associated with a specific shared library Remove entries based on block count or block size Other critera like line number, pc, subroutine name Multiple filters can be defined and toggeled on and off This allows the user to deal with large reports in an organized manner Eliminate 'false positives' or leaks that have been understood 28

Tools: Pointer Allocation Info (future) The additional information displayed for pointers in the heap in the data window will be extended Stack at point of allocation for an Allocated pointer Stack at point of deallocation for a Dangling pointer Status of notification for allocation and deallocation for the block being referenced Similar information will be exposed in the dwhat command in the CLI 29

Tools: API and Config Files (future) HIA application program interface Allow target programs to use the information exposed by the HIA The program can query HIA and perform checking based on heap status Is this pointer allocated? What is the current overall heap size? The program can alter HIA settings HIA Config Files More fine grained control of features Will allow for the peristance of settings across sessions 30

Strategies: General Thoughts Memory tracking is integrated with general debugging process Try to change a subtle error into a fatal one Take advantage of the live process under the debugger Look at context of error and/or fallout The hypothesis testing cycle is vital The debugger will catch seg faults Better for the fallout to be close to the error Use TotalView to steer problem and closely watch outcomes Use painting and dangling pointer detection to confirm or rule out memory bug CLI scripts can be used to monitor a long running application 31

Strategies: Filling up memory Scenario Processes in a parallel job are growing to fill available physical memory Strategy Rebuild with tracker and rerun under TotalView Watch heap usage with memory statistics window Leak analysis with TotalView Tips Leak report can only show point of allocation, you have to work out why they aren't getting deallocated Heap table can be dumped (in the CLI) and compared before and after operations Watch allocations with heap notification 32

Strategies: A rank process is crashing Scenario: A rank process is crashing with a segv. Something is scribbling in the heap. Strategy: Run the parallel job under TotalView with HIA Examine the variable causing the segv This will get you a stack trace Is it dangling into a deallocated block? Rerun to try to catch the scribbler in the act Watchpoints on data locations being scribbled Painting on allocation and deallocation Painting and hoarding to change allocation pattern 33

Example: Patching a Leak I have an bug in my MPICH program All of the rank processes grow to huge size I'm going to show major steps in an example debugging session First Rebuild the application (linux-x86 in this case) with -L $tvlibdir -ltvheap -W1,-rpath,$tvlibdir Next Run application with mpirun $mpi_args -tv $programname with poe this would be totalview poe -a $poe_args $programname 34

Example: Launch and confirm leak TotalView has automatically attached to rank processes 8 procs shown at right Run to region of interest Start comparing memory statistics 35

Example: Heap Tracker Leak Detection The leak report (for process number 5) Leaks classified according to stack when leaked memory was allocated Many small leaks Several groups Same size Same leaf function Different called locations 36

Example: Examine Allocation The allocations are occuring from various calls to branch() that occur in insert_to_tree() The leak seems to be occuring to all of these allocations Where are these allocations dealloated? 37

Example: Examine point of deallocation All deallocations occur here Recusive Set breakpoint to watch what is happening Focus on one process 38

Example: Observe the deallocation Watch variable 'active' Is it getting deallocated? Are its children getting deallocated? Watch several steps Ha! 39

Example: Confirmation TotalView can test small changes without recompilation 40

Conclusion Reviewed the characteristics of memory problems Proposed interactive debugging approach Integrate memory debugging with general debugging practice Discussed the capabilities of TotalView Parallel Debugger Memory Debugger Suggested strategies for tackling memory bugs with TotalView Looked closely at tracking down a leaky MPI program For more information see www.etnus.com If you are interested in being a beta tester for TotalView on Linux-Power and/or for the upcoming memory debugging enhancements contact us at support@etnus.com 41