CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

Similar documents
Optimizing Application Performance with CUDA Profiling Tools

NVIDIA Tools For Profiling And Monitoring. David Goodwin

GPU Performance Analysis and Optimisation

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

GPU Tools Sandra Wienke

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Guided Performance Analysis with the NVIDIA Visual Profiler

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

How To Test Your Code On A Cuda Gdb (Gdb) On A Linux Computer With A Gbd (Gbd) And Gbbd Gbdu (Gdb) (Gdu) (Cuda

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Eddy Integrated Development Environment, LemonIDE for Embedded Software System Development

Experiences with Tools at NERSC

HPC Wales Skills Academy Course Catalogue 2015

XID ERRORS. vr352 May XID Errors

Performance Analysis for GPU Accelerated Applications

Hands-on CUDA exercises

NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Testing for Security

How To Develop Android On Your Computer Or Tablet Or Phone

Getting Started with CodeXL

Profiler User's Guide

Google Web Toolkit. Introduction to GWT Development. Ilkka Rinne & Sampo Savolainen / Spatineo Oy

Part I Courses Syllabus

Università Degli Studi di Parma. Distributed Systems Group. Android Development. Lecture 1 Android SDK & Development Environment. Marco Picone

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

AMD CodeXL 1.7 GA Release Notes

Next Generation GPU Architecture Code-named Fermi

Running a Program on an AVD

GPU Profiling with AMD CodeXL

PAPI - PERFORMANCE API. ANDRÉ PEREIRA ampereira@di.uminho.pt

OpenACC Programming and Best Practices Guide

RA MPI Compilers Debuggers Profiling. March 25, 2009

Nios II IDE Help System

Building Embedded Systems

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Bright Cluster Manager

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Lecture 1: an introduction to CUDA

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth

Application Development for Mobile and Ubiquitous Computing

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

DS-5 ARM. Using the Debugger. Version Copyright ARM. All rights reserved. ARM DUI 0446M (ID120712)

Developing Web Services with Eclipse and Open Source. Claire Rogers Developer Resources and Partner Enablement, HP February, 2004

Visualization Tool for GPGPU Programming

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Module Title: Software Development A: Mobile Application Development

Lazy OpenCV installation and use with Visual Studio

l What is Android? l Getting Started l The Emulator l Hello World l ADB l Text to Speech l Other APIs (camera, bitmap, etc)

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

How To Run A Multi Process Powerpoint On A Multi Threaded Cuda (Nvidia) Powerpoint (Powerpoint) On A Single Process (Nvdv) On An Uniden-Cuda) On Multiple Processes (Nvm)

Real-time Debugging using GDB Tracepoints and other Eclipse features

Android Development. Lecture AD 0 Android SDK & Development Environment. Università degli Studi di Parma. Mobile Application Development

1. If we need to use each thread to calculate one output element of a vector addition, what would

CS3600 SYSTEMS AND NETWORKS

Chapter 2 System Structures

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

Introduction to CUDA C

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Effective Java Programming. efficient software development

Cluster Monitoring and Management Tools RAJAT PHULL, NVIDIA SOFTWARE ENGINEER ROB TODD, NVIDIA SOFTWARE ENGINEER

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Embedded Software development Process and Tools: Lesson-3 Host and Target Machines

End-user Tools for Application Performance Analysis Using Hardware Counters

Case Study on Productivity and Performance of GPGPUs

Installing Eclipse C++ for Windows

The Yocto Project Eclipse plug-in: An Effective IDE Environment for Embedded Application and System Developers

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Developing In Eclipse, with ADT

Workshop for WebLogic introduces new tools in support of Java EE 5.0 standards. The support for Java EE5 includes the following technologies:

A highly configurable and efficient simulator for job schedulers on supercomputers

How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

HPC Software Requirements to Support an HPC Cluster Supercomputer

Eliminate Memory Errors and Improve Program Stability


Java Application Development using Eclipse. Jezz Kelway Java Technology Centre, z/os Service IBM Hursley Park Labs, United Kingdom

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL

For Introduction to Java Programming, 5E By Y. Daniel Liang

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

STLinux Software development environment

Monitoring, Tracing, Debugging (Under Construction)

Selection Criteria for ZigBee Development Kits

Waspmote IDE. User Guide

Transcription:

Mitglied der Helmholtz-Gemeinschaft CUDA Tools for Debugging and Profiling Jiri Kraus (NVIDIA) GPU Programming@Jülich Supercomputing Centre Jülich 7-9 April 2014

What you will learn How to use cuda-memcheck to detect invalid memory accesses How to use Nsight EE to debug a CUDA Program How to use the NVIDIA visual profiler

cuda-memcheck cuda-memcheck is a memory correctness tool similar to valgrind memcheck cuda-memcheck provided to tools (select via tool) memcheck: Memory access checking racecheck: Shared memory hazard checking Compile with debugg information (-g -G)

cuda-memcheck

Taks 0: Use cuda-memcheck to identify error Go to CUDATools/exercises/tasks Build Task 0 make task0-cuda-memcheck Run cuda-memcheck cuda-memcheck./task0-cuda-memcheck Identify and fix the error (cuda-memcheck should run with out errors) task0-cuda-memcheck.cu

Nsight Eclipse Edition Nsight Eclipse Edition is an IDE for CUDA development Source Editor with CUDA C and C++ syntax highlighting Project and files management with version control integration Integrated build system GUI for debugging heterogeneous applications Visual profiler integration Nsight EE is part of the CUDA Toolkit

Using Nsight EE to debug a CUDA Program Start Nsight EE nsight

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Using Nsight EE to debug a CUDA Program

Taks 1: Use Nsight EE to debug a program Go to CUDATools/exercises/tasks Build Task 1 make task1-cuda-gdb Start Nsight EE nsight Setup a debug session in Nsight EE Use the variable view to let thread 4 from block 1 print 4 (instead of 0) Do not modify the source code

Why Performance Measurement Tools? You can only improve what you measure Need to identify: Hotspots: Which function takes most of the run time? Bottlenecks: What limits the performance of the Hotspots? Manual timing is tedious and error prone Possible for small application like jacobi and matrix multiplication Impractical for larger/more complex application Access to hardware counters (PAPI, CUPTI)

The command line profiler nvprof Simple launcher to get profiles of your application Profiles CUDA Kernels and API calls > nvprof./jacobi ======== NVPROF is profiling jacobi... ======== Command: jacobi Jacobi (serial) [ ] snip ======== Profiling result: Time(%) Time Calls Avg Min Max Name 72.14 352.65ms 1000 352.65us 350.48us 354.94us Jacobi_86_gpu 26.02 127.23ms 1000 127.23us 93.48us 128.34us Jacobi_74_gpu 0.84 4.09ms 1000 4.09us 4.04us 4.36us Jacobi_96_gpu_red 0.61 3.00ms 1009 2.97us 2.78us 56.16us [CUDA memcpy HtoD] 0.39 1.91ms 1002 1.91us 1.82us 52.41us [CUDA memcpy DtoH]

nvprof interoperability with nvvp nvprof can write the application profile to nvvp compatible file: nvprof -o jacobi.nvprof./jacobi Import in nvvp

nvprof important command-line options Options: -o, --output-profile <filename> Output the result file which can be imported later or opened by the NVIDIA Visual Profiler. --analysis-metrics Collect profiling data that can be imported to Visual Profiler's "analysis" mode. NOTE: Use "--output-profile" to specify an output file. -h, --help Print this help information.

nvvp introduction

Task 2: Analyze Jacobi Timeline Start jacobi with nvprof and write profile to file Import profile into nvvp Compare the profiles with and without data region.

Task 3: Analyze memory movement of unified memory Start matrix multiplication in nvvp with unified memory profiling enabled

Task 4: Analyze matrix multiplication example with nvvp Start new session in nvvp with the matrix multiplication example Run the guided analysis What is the performance limiter?

Cheat Sheet Start nvprof nvprof -o <output-profile>./a.out Start nvvp nvvp profiler users guide http://docs.nvidia.com/cuda/profiler-users-guide/index.html