S Debugging CUDA Kernel Code with NVIDIA Nsight Visual Studio Edition

Similar documents
ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

GPU Tools Sandra Wienke

Optimizing Application Performance with CUDA Profiling Tools

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Getting Started with CodeXL

GPU Usage. Requirements

Guided Performance Analysis with the NVIDIA Visual Profiler

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

GPU Profiling with AMD CodeXL

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

L20: GPU Architecture and Models

GPU Parallel Computing Architecture and CUDA Programming Model

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

GPU Performance Analysis and Optimisation

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Using Process Monitor

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Developer Tools. Tim Purcell NVIDIA

AMD CodeXL 1.7 GA Release Notes

Debugging with TotalView

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Stream Processing on GPUs Using Distributed Multimedia Middleware

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA


Introduction to GPGPU. Tiziano Diamanti

serious tools for serious apps

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

RTOS Debugger for ecos

Next Generation GPU Architecture Code-named Fermi

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

GPGPU Computing. Yong Cao

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

How To Test Your Code On A Cuda Gdb (Gdb) On A Linux Computer With A Gbd (Gbd) And Gbbd Gbdu (Gdb) (Gdu) (Cuda

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Profiler User's Guide

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Introduction to GPU hardware and to CUDA

Hands-on CUDA exercises

Performance Optimization and Debug Tools for mobile games with PlayCanvas

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

Chapter 6, The Operating System Machine Level

Computer Graphics Hardware An Overview

Manjrasoft Market Oriented Cloud Computing Platform

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS

gpus1 Ubuntu Available via ssh

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Trustworthy Computing

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Getting Started with Tizen SDK : How to develop a Web app. Hong Gyungpyo 洪 競 杓 Samsung Electronics Co., Ltd

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Eliminate Memory Errors and Improve Program Stability

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

Amazon EC2 Product Details Page 1 of 5

x86 ISA Modifications to support Virtual Machines

The Fastest, Most Efficient HPC Architecture Ever Built

NVIDIA GeForce GTX 580 GPU Datasheet

Introduction to GPU Programming Languages

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

Version: July Windows 7

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

Network Licensing. White Paper 0-15Apr014ks(WP02_Network) Network Licensing with the CRYPTO-BOX. White Paper

Red Hat Linux Internals

WINDOWS PROCESSES AND SERVICES

End-user Tools for Application Performance Analysis Using Hardware Counters

Understand Performance Monitoring

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Cloud Computing. Up until now

Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v Steps to Developing a QNX Program Quickstart Guide

How To Understand And Understand An Operating System In C Programming

VEEAM ONE 8 RELEASE NOTES

theguard! ApplicationManager System Windows Data Collector

CUDA programming on NVIDIA GPUs

NVIDIA VIDEO ENCODER 5.0

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HPC with Multicore and GPUs

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL

GPU for Scientific Computing. -Ali Saleh

Lab 2-2: Exploring Threads

Advanced Graphics and Animations for ios Apps

SCADA/HMI MOVICON TRAINING COURSE PROGRAM

Java Troubleshooting and Performance

Part I Courses Syllabus

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Introduction. Interoperability & Tools Group. Existing Network Packet Capture Tools. Challenges for existing tools. Microsoft Message Analyzer

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview of CS 282 & Android

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Transcription:

S3478 - Debugging CUDA Kernel Code with NVIDIA Nsight Visual Studio Edition

Agenda What is Nsight and how can it help me? Creating projects and CUDA build system live demo Nsight CUDA debugger features helps you find issues live demo Nsight support for CUDA Dynamic Parallelism live demo Summary of Nsight Q&A Overview of Nsight System & application trace and CUDA profiling Supported configurations

NVIDIA Nsight Visual Studio Edition Visual Studio integrated development for GPU and CPU Build Debug Profile

NVIDIA Nsight GPU Debugger Local and single GPU Compute and Graphics debugging GPU breakpoints including complex conditionals GPU memory views and exception reporting Dynamic Shader Editing Graphics Inspector Real-time inspection of 3D API calls and states Investigate GPU pipeline states See contributing fragments with Pixel History Profile frames to find GPU bottlenecks System Analysis View CPU & GPU events on a single timeline Examine workload dependencies, memory transfe CPU/OS, Compute, Direct3D and OpenGL Trace Trace WDDM and I/O ETW events Capture call stack and jump to source

NVIDIA Nsight for Compute Developers CUDA debugger Debug CUDA kernels directly on GPU hardware Info page and Warp watch view Use on-target conditional breakpoints to locate errors CUDA memory checker Out of bounds memory access detection Enables precise error detection Application and system trace Review CUDA activities across CPU and GPU Activity correlation panel CUDA profiler Source code correlation Deep kernel analysis to detect factors limiting maximum performance Unlimited experiments on live kernels

CUDA Build System CUDA Toolkit 4.2 and 5.0 support Visual Studio 2008 SP1 and 2010 SP1 CUDA project wizard Visual C++ and.net project integration Project setting extensions

NVIDIA Nsight CUDA Debugging Native GPU debugging with mixed CUDA-C/PTX/SASS assembly Debug GPU PTX/SASS code without Symbolic info with CUDA-C Debugger attach to running process On device conditional breakpoint evaluation with program variables GPU memory views and data breakpoints CUDA expression engine and stack frame support Massively-threaded GPU kernels navigation and run-control CUDA memory checker CUDA Info Tool-Window shows {all CUDA resources Memory Allocations, Contexts, }

Application to debug: Mandelbrot Mandelbrot Set is the visual representation of an iterated function on the complex plane Z = Z 2 + C

Attaching to a running Kernel with Nsight Set the Nsight monitor to Use this monitor for CUDA attach From the command line, enable Nsight to catch GPU exceptions & memory issues: SET NSIGHT_CUDA_DEBUGGER=2 Setting it to 1 will allow Nsight to catch subset of GPU exceptions

Nsight Debugging Make sure Generate GPU Debug information is set to Yes (-G) Debug Start Debugging (or F5), launches the CPU debugger Choose Nsight Start CUDA Debugging This will launch the Startup Project Working directory = project directory if setting is empty!= CPU directory setting

CUDA info toolwindow Provides a view from the CUDA driver API layer, which sits below the CUDA runtime CUcontext CUmodule CUfunction CUstream

What is Dynamic Parallelism? The ability to launch new grids from the GPU Dynamically Simultaneously Independently CPU GPU CPU GPU Fermi: Only CPU can generate GPU work Kepler: GPU can generate work for itself

What Does It Mean? CPU GPU CPU GPU GPU as Co-Processor Autonomous, Dynamic Parallelism

Dynamic Work Generation Fixed Grid Statically assign conservative worst-case grid Dynamically assign performance where accuracy is required Initial Grid Dynamic Grid

CDP example code global ChildKernel(void *data){ //Operate on data } global ParentKernel(void *data){... ChildKernel<<<16, 1>>>(data); } // In Host Code ParentKernel<<<256, 64>>(data); Recursion is supported, and a kernel may call itself: global RecursiveKernel(void *data){ if (continuerecursion == true) RecursiveKernel<<<64, 16>>>(data); }

Nsight support for Dynamic Parallelism CUDA information toolwindow with parent/child relationship: Positive Grid IDs are kernels launched from host Negative Grid IDs are child kernels launched from device Sleeping status Catches invalid pointer arguments to cudamemcpy*()

Nsight support for Dynamic Parallelism Tracks device to device kernel launches Active Warp time Visual correlation of parent/child in the timeline and graphical call graph

NVIDIA Nsight System Trace Application and system trace OS, CPU, Graphics and Compute APIs, Driver and GPU support Concurrent kernel execution and memory transfer trace Dependency tracking between API and GPU workload NVTX API for source code instrumentation

CUDA runtime and driver calls Memory transfers CUDA kernel execution Dependency tracking Asynchronous memory transfers Concurrent kernel execution

NVIDIA Nsight CUDA Profiling CUDA profiler with live counter reconfiguration Unlimited experiments on live kernels with kernel replay Advanced profiling experiments Achieved occupancy Instruction throughput Full memory hierarchy statistics Kernel profiling filtering

New in NVIDIA Nsight Visual Studio Edition 3.0 Kepler 2 architecture support CUDA Toolkit 5.0 support Debug and trace kernels using CUDA Dynamic Parallelism (CDP) Debug and profile kernel using CUDA Static Linking Ability to debug optimized/release CUDA-C kernels Attach debugger to a kernel paused at a breakpoint or exception Ability to copy, paste and edit expression in the CUDA warp watch Display texture information in CUDA Info page Ability to debug GLSL and CUDA GPU code in the same debug session with Maximus systems

New in NVIDIA Nsight Visual Studio Edition 3.0 CUDA Profiler with source code correlation Instruction Count, Divergent Branch, Memory Transactions Annotated Source Viewer for CUDA-C/PTX/SASS New CUDA profiler experiments System Trace Improvements CDP Trace: Device Launch Trace, Self/Total Active Warp Time CUDA Queue Depth Trace Multi-GPU P2P memory transfers and cudasetgldevice File and Disk I/O ETW events WDDM ETW user mode and kernel mode command queues Custom trace data visualization with NVTXT

Fully Featured Configurations WDDM driver Tesla Compute Cluster driver Application and system trace CUDA profiling CUDA debugger CUDA memory checker 2 GPUs Maximus Tesla +GPU Single GPU Optimus Laptop Remote PC

How can I learn more about Nsight? Download http://www.nvidia.com/nsight Nsight Visual Studio Edition Developer Forums https://devtalk.nvidia.com Development Tools Nsight, Visual Studio Edition Nsight documentation Nsight Help Local (or Online) Help

How can I learn more about Nsight at GTC? Nsight hands-on labs: Tue 5pm: S3523 - Hands-on Lab: CUDA Application Debugging Using Nsight Thu 5pm: S3535 - Hands-on Lab: CUDA Application Optimization Using Nsight Nsight sessions: Wed 9am 10am: S3381 - Developing and Optimizing CPU & GPU Pipelines in Siemens' CUDA Accelerated Solutions Thu 9am 10am: S3376 - Optimizing Siemens' DirectModel Rendering Engine Thu 10am 11am: S3377 - Seamless Compute and OpenGL Graphics Development Thu 4pm 5pm: S3382 - Profiling and Optimizing CUDA Kernel Code Come and visit the Nsight VSE booth at the Exhibition floor: Tue and Wed from 12pm 2pm and 6pm 8pm Thu 12pm 2pm

Thank you! This Session ID: S3478