Using the Intel Inspector XE

Similar documents
Debugging with TotalView

Eliminate Memory Errors and Improve Program Stability

OpenMP Programming on ScaleMP

Performance Characteristics of Large SMP Machines

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Improve Fortran Code Quality with Static Analysis

GPU Tools Sandra Wienke

Suitability of Performance Tools for OpenMP Task-parallel Programs

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:

Improve Fortran Code Quality with Static Security Analysis (SSA)

Spring 2011 Prof. Hyesoon Kim

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

OpenACC Basics Directive-based GPGPU Programming

Introduction to Hybrid Programming

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

Parallel Computing. Shared memory parallel programming with OpenMP

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Chapter 2: OS Overview

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Multi-Threading Performance on Commodity Multi-Core Processors

Parallel Algorithm Engineering

Design Patterns in C++

Instrumentation Software Profiling

Experiences with HPC on Windows

PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA

Parallel Debugging with DDT

An Implementation Of Multiprocessor Linux

Oracle Solaris Studio Code Analyzer

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

OpenACC 2.0 and the PGI Accelerator Compilers

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

HammerDB Metrics. Introduction. Installation

Chapter 11 I/O Management and Disk Scheduling

Xeon Phi Application Development on Windows OS

OpenMP and Performance

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Performance Analysis and Optimization Tool

Memory Debugging with TotalView on AIX and Linux/Power

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Embedded Software Development

Parallel Computing. Parallel shared memory computing with OpenMP

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

Chapter 10 Case Study 1: LINUX

Case Study on Productivity and Performance of GPGPUs

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Programming the Intel Xeon Phi Coprocessor

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

AMD CodeXL 1.7 GA Release Notes

Trace-Based and Sample-Based Profiling in Rational Application Developer

Java VM monitoring and the Health Center API. William Smith

Introduction to Embedded Systems. Software Update Problem

Revealing the performance aspects in your code. Intel VTune Amplifier XE Generics. Rev.: Sep 1, 2013

Session 2: MUST. Correctness Checking

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

MAQAO Performance Analysis and Optimization Tool

Chapter 6, The Operating System Machine Level

OpenMP C and C++ Application Program Interface

High Performance Computing

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

COMMONWEALTH OF PENNSYLVANIA DEPARTMENT S OF PUBLIC WELFARE, INSURANCE, AND AGING

PrimeRail Installation Notes Version A June 9,

Massachusetts Institute of Technology 6.005: Elements of Software Construction Fall 2011 Quiz 2 November 21, 2011 SOLUTIONS.

Leak Check Version 2.1 for Linux TM

Shared Address Space Computing: Programming

Advanced Error Detection Techniques

CUDA Programming. Week 4. Shared memory and register

Purify User s Guide. Version 4.1 support@rational.com

Developing Parallel Applications with the Eclipse Parallel Tools Platform

An Introduction to Multi-threaded Debugging Techniques

An HPC Application Deployment Model on Azure Cloud for SMEs

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Operating System Engineering: Fall 2005

Building Applications Using Micro Focus COBOL

Multi-core Programming System Overview


Performance Evaluation and Optimization of A Custom Native Linux Threads Library

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

How To Manage An Sap Solution

Object Oriented Software Design II

HPC Wales Skills Academy Course Catalogue 2015

Monitors, Java, Threads and Processes

The RWTH Compute Cluster Environment

Operating System Structures

Getting Started with CodeXL

A Case Study - Scaling Legacy Code on Next Generation Platforms

What s Cool in the SAP JVM (CON3243)

Transcription:

Using the Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ)

Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory location, and at least one of these accesses is a write, and the accesses are not protected by locks or critical regions, and the accesses are not synchronized, e.g. by a barrier. Non-deterministic occurrence: e.g. the sequence of the execution of parallel loop iterations is non-deterministic and may change from run to run In many cases private clauses, barriers or critical regions are missing Data races are hard to find using a traditional debugger Use the 2

Detection of Memory Errors Dead Locks Data Races Support for Linux (32bit and 64bit) and Windows (32bit and 64bit) WIN32-Threads, Posix-Threads, Intel Threading Building Blocks and OpenMP New Features (compared to Intel Thread Checker) Binary Instrumentation gives full functionality Independent stand-alone GUI for Windows and Linux memory error detection static security analysis (in combination with the Intel 12.X compiler ) 3

PI Example Code double f(double x) { return (4.0 / (1.0 + x*x)); } double CalcPi (int n) { const double fh = 1.0 / (double) n; double fsum = 0.0; double fx; int i; π = 1 0 4 1 + x 2 #pragma omp parallel for private(fx,i) reduction(+:fsum) for (i = 0; i < n; i++) { fx = fh * ((double)i + 0.5); fsum += f(fx); } return fh * fsum; } 4

PI Example Code double f(double x) { return (4.0 / (1.0 + x*x)); } double CalcPi (int n) { const double fh = 1.0 / (double) n; double fsum = 0.0; double fx; int i; #pragma omp parallel for private(fx,i) reduction(+:fsum) for (i = 0; i < n; i++) { fx = fh * ((double)i + 0.5); fsum += f(fx); } return fh * fsum; } What if we would have forgotten this? 5

Inspector XE Create Project $ module load intelixe ; inspxe-gui 6

Inspector XE Create Project ensure that multiple threads are used choose a real small dataset, execution time can grow 10X 1000X 7

Inspector XE Configure Analysis Threading Error Analysis Modes 1. Detect Deadlocks 2. Detect Deadlocks and Data Races 3. Locate Deadlocks and Data Races more details, more overhead 8

Inspector XE Results 1. Detected Problems 2. Filters 3. Code Location 1 3 2 9

Inspector XE Results 1. Timeline View 1 10

Inspector XE Results 1. Source Code producing the issue double click opens an editor 2. Corresponding Call Stack 1 2 11

Inspector XE Results 1. Source Code producing the issue double click opens an editor 2. Corresponding Call Stack The missing reduction is detected. 12

Inspector XE Memory Errors 1. Detect Leaks Find allocation of memory blocks without de-allocation in the program. Can waste a lot of memory. 2. Detect Memory Problems Memory Leaks Detect invalid/uninitialized accesses read/write to invalid address read an uninitialized address 3. Locate Memory Problems Memory Leaks Detect invalid/uninitialized accesses Memory Accesses after an allocated block (using guard blocks ) Dangling Pointer Accesses 13

Command Line Tool inspxe-cl Memory Error Analysis Modes 1. Detect Leaks (mi1) 2. Detect Memory Problems (mi2) 3. Locate Memory Problems (mi3) Threading Error Analysis Modes 1. Detect Deadlocks (ti1) 2. Detect Deadlocks and Data Races (ti2) 3. Locate Deadlocks and Data Races (ti3) Data collection without GUI allows to use batch jobs. $ inspxe-cl -collect ti3 -- pi.exe < input $ inspxe-cl -report problems 14 Viewing results in text mode is helpful, when remote connections are slow.

Command Line Tool inspxe-cl Create the command line from a project to use it in batch mode. 15

Inspector XE Always use a Thread Checking Tool before putting an OpenMP Code into production. Thank you for your attention! Questions? 16