Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Similar documents
Debugging with TotalView

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

HPC Wales Skills Academy Course Catalogue 2015

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Productivity and HPC. Erik Hagersten, CTO, Rogue Wave Software AB Developing parallel, data-intensive applications is hard. We make it easier.

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

GPU Tools Sandra Wienke

Improve Fortran Code Quality with Static Analysis

Memory Debugging with TotalView on AIX and Linux/Power

locuz.com HPC App Portal V2.0 DATASHEET

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

serious tools for serious apps

Parallel Programming Survey

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

MPI / ClusterTools Update and Plans

Allinea Forge User Guide. Version 6.0.1

Part I Courses Syllabus

Overview of HPC Resources at Vanderbilt

Bright Cluster Manager

Using the Windows Cluster

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Parallel Debugging with DDT

Xeon+FPGA Platform for the Data Center

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Program Grid and HPC5+ workshop

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

The Asterope compute cluster

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Xeon Phi Application Development on Windows OS

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Big Data Visualization on the MIC

Technical Overview of Windows HPC Server 2008

The GRID according to Microsoft

HPC Software Requirements to Support an HPC Cluster Supercomputer

supercomputing. simplified.

Evaluation of CUDA Fortran for the CFD code Strukti

Optimization tools. 1) Improving Overall I/O

LSKA 2010 Survey Report Job Scheduler

MAQAO Performance Analysis and Optimization Tool

Linux tools for debugging and profiling MPI codes

Stream Processing on GPUs Using Distributed Multimedia Middleware

End-user Tools for Application Performance Analysis Using Hardware Counters

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Resource Utilization of Middleware Components in Embedded Systems

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

On-Demand Supercomputing Multiplies the Possibilities

Case Study on Productivity and Performance of GPGPUs

Running applications on the Cray XC30 4/12/2015

Improve Fortran Code Quality with Static Security Analysis (SSA)

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Windows IB. Introduction to Windows 2003 Compute Cluster Edition. Eric Lantz Microsoft

A quick tutorial on Intel's Xeon Phi Coprocessor

Share and aggregate GPUs in your cluster. F. Silla Technical University of Valencia Spain

PRIMERGY server-based High Performance Computing solutions

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Interoperability between Sun Grid Engine and the Windows Compute Cluster

Next Generation GPU Architecture Code-named Fermi

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

How To Visualize Performance Data In A Computer Program

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Manjrasoft Market Oriented Cloud Computing Platform

Intel Many Integrated Core Architecture: An Overview and Programming Models

Using the Intel Inspector XE

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

BLM 413E - Parallel Programming Lecture 3

High Performance Applications over the Cloud: Gains and Losses

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

GPUs for Scientific Computing

A Case Study - Scaling Legacy Code on Next Generation Platforms

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to HPC Workshop. Center for e-research

Getting Started with CodeXL

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

What s New in Centrify DirectAudit 2.0

What s Cool in the SAP JVM (CON3243)

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Notes and terms of conditions. Vendor shall note the following terms and conditions/ information before they submit their quote.

Introduction to Hybrid Programming

Allinea DDT and MAP User Guide. Version 4.2

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth

Eliminate Memory Errors and Improve Program Stability

Introduction to Cloud Computing

High Performance Computing (HPC)

COSCO 2015 Heterogeneous Computing Programming

Hands-on CUDA exercises

Bringing Big Data Modelling into the Hands of Domain Experts

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Lab Management, Device Provisioning and Test Automation Software

Transcription:

Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014

Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue Wave Software, Inc. All Rights Reserved 2

Introduction

Rogue Wave History Founded: 1989 Portfolio company of Audax Group Acquisitions: Visual Numerics: 2009 TotalView Technologies: 2010 ILOG Visualization for C++ : 2012 OpenLogic : 2013 Klocwork : 2013 ILOG Visualization for Java: 2014 Customers 3,000+ in 57 countries Financial services, telecoms, oil and gas, government and aerospace, research and academic Global Locations HQ: Boulder, CO NA: Houston, TX; Corvallis, OR; Natick MA EMEA: France, Germany, UK APAC: Japan COPYRIGHT 2010 ROGUE WAVE SOFTWARE 4

Challenges of developing for heterogeneous environments

Challenges Number of CPU cores increasing but clock speed is static or decreasing How to program accelerators Lower level languages (OpenCL, CUDA) Directives based (OpenACC, OpenMP) New algorithms or programming models may be needed Data sizes increasing exponentially Memory is increasingly important Power consumption and constraints 2014 Rogue Wave Software, Inc. All Rights Reserved 6

How does Rogue Wave help? TotalView debugger Troubleshooting and analysis tool Visibility into applications Control over applications Scalability Usability Support for HPC platforms and languages 2014 Rogue Wave Software, Inc. All Rights Reserved

TotalView Overview

What is TotalView? Application Analysis and Debugging Tool: Code Confidently Debug and Analyse C/C++ and Fortran on Linux, Unix or Mac OS X Laptops to supercomputers Makes developing, maintaining, and supporting critical apps easier and less risky Major Features Easy to learn graphical user interface with data visualization Parallel Debugging MPI, Pthreads, OpenMP CUDA, OpenACC, and Intel Xeon Phi coprocessor Low tool overhead resource usage Includes a Remote Display Client which frees you to work from anywhere Memory Debugging with MemoryScape Deterministic Replay Capability Included on Linux/x86-64 Non-interactive Batch Debugging with TVScript and the CLI TTF & C++View to transform user defined objects 2014 Rogue Wave Software, Inc. All Rights Reserved

MemoryScape Runtime Memory Analysis : Eliminate Memory Errors Detects memory leaks before they are a problem Explore heap memory usage with powerful analytical tools Use for validation as part of a quality software development process Major Features Included in TotalView, or Standalone Detects Malloc API misuse Memory leaks Buffer overflows Supports C, C++, Fortran Linux, Unix, and Mac OS X Intel Xeon Phi MPI, pthreads, OMP, and remote apps Low runtime overhead Easy to use Works with vendor libraries No recompilation or instrumentation 2014 Rogue Wave Software, Inc. All Rights Reserved

Deterministic Replay Debugging Reverse Debugging: Radically simplify your debugging Captures and Deterministically Replays Execution Not just checkpoint and restart Eliminate the Restart Cycle and Hard-to-Reproduce Bugs Step Back and Forward by Function, Line, or Instruction Specifications A feature included in TotalView on Linux x86 and x86-64 No recompilation or instrumentation Explore data and state in the past just like in a live process, including C++View transformations Replay on Demand: enable it when you want it Supports MPI on Ethernet, Infiniband, Cray XE Gemini Supports Pthreads, and OpenMP New: Save / Load Replay Information (CLI only) 2014 Rogue Wave Software, Inc. All Rights Reserved

Cambridge Study Survey conducted by the Judge Business School at Cambridge University concluded that Reverse Debuggers allow users, on average, to spend 13% less of their programming time debugging. Programming was 50% of total work week on average Debugging was 50% of programming time without reverse debugging Debugging was 37% of programming time with reverse debugging That frees up 130 hours (>3 work weeks, 6.5% total time) per developer per year for design and new feature development The survey looked at total value (salaries & overhead) of debugging as a task and they determined that this savings could, across the whole world economy, be work $41 billion in increased productivity. The productivity improvement should be worth $2,500 per developer per year (salary only) or $5,000 per year with overhead. http://www.roguewave.com/company/news-events/press-releases/2013/university-of-cambridgereverse-debugging-study.aspx 2014 Rogue Wave Software, Inc. All Rights Reserved 1 2

TotalView for the NVIDIA GPU Accelerator NVIDIA CUDA 6.5 With support for Unified Memory Features and capabilities include Support for dynamic parallelism Support for MPI based clusters and multi-card configurations Flexible Display and Navigation on the CUDA device Physical (device, SM, Warp, Lane) Logical (Grid, Block) tuples CUDA device window reveals what is running where Support for types and separate memory address spaces Leverages CUDA memcheck 2014 Rogue Wave Software, Inc. All Rights Reserved

Displaying NVIDIA GPU Device Information 14

TotalView for OpenACC TotalView for OpenACC OpenACC DIRECTIVES FOR ACCELERATORS Step host & device View variables Set breakpoints Compatibility with Cray CCE 8 OpenACC

TotalView for the Intel Xeon Phi coprocessor Supports All Major Intel Xeon Phi Coprocessor Configurations Native Mode With or without MPI Offload Directives Incremental adoption, similar to GPU Symmetric Mode Host and Coprocessor Multi-device, Multi-node Clusters User Interface MPI Debugging Features Process Control, View Across, Shared Breakpoints Heterogeneous Debugging Debug Both Xeon and Intel Xeon Phi Processes Memory Debugging Both native and symmetric mode 2014 Rogue Wave Software, Inc. All Rights Reserved

CPU-Centric Intel Xeon Processor Multi-core Hosted Spectrum of Intel Xeon Phi Execution Models Offload Intel Xeon Phi-Centric Symmetric Intel Xeon Phi Many-Core Hosted General purpose serial and parallel computing Codes with balanced needs Highly-parallel codes Codes with highly- parallel phases Multi-core Many-core Main( ) Foo( ) MPI_*() Main( ) Foo( ) MPI_*() Foo( ) Main( ) Foo( ) MPI_*() Main( ) Foo( ) MPI_*() Main() Foo( ) MPI_*() PCIe Productive Programming Models Across the Spectrum 17

Debugging Intel Xeon Phi Applications with Offloaded Code Xeon side Xeon Phi side One debugging session for MIC-accelerated code 18

Debugging Intel Xeon Phi MPI Applications Start multi-host multi card MPI job Attach to subset of processes on MIC coprocessor Set breakpoints Debug as usual MPI 19

Diving on CAF array y Coarray Fortran Diving on CAF array y across processes Supported on Cray platforms with CCE

Advanced Features

Remote Display Client Offers users the ability to easily set up and operate a TotalView debug session that is running on another system Consists of two components Client runs on local machine Server runs on any system supported by TotalView and invisibly manages the secure connection between host and client Remote Display Client is available for: Linux x86, x86-64 Windows XP, Vista, 7 Mac OS X 22

Multi-dimensional array viewer See your arrays on a Grid display 2-D, 3-D, N-D Arbitrary slices Specify data representation Windowed data access Fast 23

Visualizing Arrays Visualize array data using Tools > Visualize from the Variable Window Large arrays can be sliced down to a reasonable size first Visualize is a standalone program Data can be piped out to other visualization tools Visualize allows to spin, zoom, etc. Data is not updated with Variable Window; You must revisualize $visualize() is a directive in the expression system, and can be used in evaluation point expressions. 24

Debugging MPMD applications totalview -args aprun -n 9 worker : -n 1 master 25 25

Message Queue Graph Message Queue Debugging Filtering Tags MPI Communicators Cycle detection Find deadlocks 26 26

TVScript Gives you non-interactive access to TotalView s capabilities Useful for Debugging in batch environments Watching for intermittent faults Parametric studies Automated testing and validation TVScript is a script (not a scripting language) It runs your program to completion and performs debugger actions on it as you request Results are written to an output file No GUI No interactive command line prompt Used at sites such as DMI and STFC Daresbury for automated comparative debugging

28 C++View C++View is a simple way for you to define type transformations Simplify complex data Aggregate and summarize Check validity Transforms Type-based Compose-able Automatically visible Code C++ Easy to write Resides in target Only called by TotalView

Viewing Fortran User-Defined Types TYPE WHOPPER LOGICAL, DIMENSION(ISIZE) :: FLAGS DOUBLE PRECISION, DIMENSION(ISIZE) :: DPSA DOUBLE PRECISION, DIMENSION(:), POINTER :: DPPA END TYPE WHOPPER TYPE(WHOPPER), DIMENSION(:), ALLOCATABLE :: STUFFTYP1

Current Work and Future Plans

What is new in TotalView 8.14.1 CUDA 6.5 support Coarray Fortran support for the Cray CCE compiler Extended support for type transformations with the Intel compiler (unordered STL collection classes) Improved delayed symbol processing (better performance for larger executables)

Multi-phase R&D Projects Underway Massive Scalability Collaboration with LLNL and Tri-lab partners Targeting Cray, Blue Gene and Linux Clusters MRNet software overlay network for multicast and reduction New GUI Sleek, Modern and Fast Configurable Improved Usability Provides aggregation capabilities for big data and scale Leveraging math and stat expertise from IMSL Working with customers through early access programs Customer input is key to the success of both programs 2014 Rogue Wave Software, Inc. All Rights Reserved 32

Thanks! Visit the website http://www.roguewave.com/products/totalview.aspx Videos (3 new videos on Xeon Phi) Documentation Sign up for an evaluation Visit us at SC14 (booth 2338) 2014 Rogue Wave Software, Inc. All Rights Reserved 33