GPU Profiling with AMD CodeXL



Similar documents
Getting Started with CodeXL

AMD CodeXL 1.7 GA Release Notes

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Performance Optimization and Debug Tools for mobile games with PlayCanvas

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Developer Tools. Tim Purcell NVIDIA

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Next Generation GPU Architecture Code-named Fermi

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

Getting Started with Tizen SDK : How to develop a Web app. Hong Gyungpyo 洪 競 杓 Samsung Electronics Co., Ltd

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

NVIDIA GeForce GTX 580 GPU Datasheet

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

STLinux Software development environment

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

Monitoring, Tracing, Debugging (Under Construction)

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Republic Polytechnic School of Information and Communications Technology C226 Operating System Concepts. Module Curriculum

Red Hat Linux Internals

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Performance Analysis for GPU Accelerated Applications

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Introduction to GPU hardware and to CUDA

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Press Briefing. GDC, March Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group Page 1

GRID VGPU FOR VMWARE VSPHERE

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Amazon EC2 Product Details Page 1 of 5

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Optimizing AAA Games for Mobile Platforms

Optimizing Application Performance with CUDA Profiling Tools

Introduction to OpenCL Programming. Training Guide

INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0

Full and Para Virtualization

Introduction to Embedded Systems. Software Update Problem

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

GPU Tools Sandra Wienke


L20: GPU Architecture and Models

White Paper OpenCL : The Future of Accelerated Application Performance Is Now. Table of Contents

<Insert Picture Here> Oracle Database Support for Server Virtualization Updated December 7, 2009

Part I Courses Syllabus

MAQAO Performance Analysis and Optimization Tool

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

Complete Integrated Development Platform Copyright Atmel Corporation

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Eddy Integrated Development Environment, LemonIDE for Embedded Software System Development

Introduction to TIZEN SDK

Programación de Sistemas Empotrados y Móviles (PSEM)

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

ELEC 377. Operating Systems. Week 1 Class 3

Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Ensure that the AMD APP SDK Samples package has been installed before proceeding.

NVIDIA GeForce Experience

AMD APP SDK v2.8 FAQ. 1 General Questions

Low power GPUs a view from the industry. Edvard Sørgård

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Chapter 2 System Structures

System/Networking performance analytics with perf. Hannes Frederic Sowa

HP Workstations graphics card options

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

AMD EMBEDDED PCIe ADD-IN BOARD Comparison

Stream Processing on GPUs Using Distributed Multimedia Middleware

Intel DPDK Boosts Server Appliance Performance White Paper

Operating System: Scheduling

Using MATLAB to Measure the Diameter of an Object within an Image

Hands-on CUDA exercises

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Android Architecture. Alexandra Harrison & Jake Saxton

IOTIVITY AND EMBEDDED LINUX SUPPORT. Kishen Maloor Intel Open Source Technology Center

IDL. Get the answers you need from your data. IDL

Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB

RTOS Debugger for ecos

System Requirements G E N E R A L S Y S T E M R E C O M M E N D A T I O N S

Computer Graphics on Mobile Devices VL SS ECTS

AMD Proprietary Linux Release Notes

Transcription:

GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel

OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 2

1. MOTIVATION Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 3

1. MOTIVATION Vertex Displacement Kernel Initialize GL-Buffer Kernel Disturb Grid Kernel Finite Difference Scheme Kernel Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 4

1. MOTIVATION Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 5

2. GPU RECAP http://www.amd.com/la/documents/gcn_architecture_whitepaper.pdf Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 6

2. GPU RECAP Compute Unit: http://www.amd.com/la/documents/gcn_architecture_whitepaper.pdf Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 7

3. OPENCL Platform Model: http://rastergrid.com/blog/2010/11/texture-and-buffer-access-performance/ Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 8

3. OPENCL Memory Hierarchy: http://www.codeproject.com/articles/122405/part-2-opencl-memory-spaces Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 9

3. OPENCL Kernel Execution Model: OpenCL Programming Guide (Addison-Wesley) Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 10

4. CODEXL OVERVIEW AMDs unified tool suite for profiling and debugging AMD CPUs, GPUs and APUs Former programs were: gdebugger APP Profiler APP Kernel Analyzer Supported platforms: Windows 7/8 (32-64Bit) Red Hat Enterprise Linux 64Bit Ubuntu 64Bit 12.04 or later Standalone application or Visual Studio 2010/2012 plugin Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 11

4. CODEXL OVERVIEW CPU Profiler CPU Sampling Call-Graph Profiling Features GPU Profiling Application Trace Hardware Performance Counters Kernel Occupancy Hotspots Analysis GPU Debugging OpenGL & OpenCL API calls OpenCL Kernel Debugging DirectCompute Debugging Static Kernel Analysis Hardware Disassembly Kernel Code Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 12

4. CODEXL OVERVIEW Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 13

5. CODEXL INTERNALS How does CodeXL Profiling works under the hood? Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 14

5. CODEXL INTERNALS Developers can instrument their source code by using the CLPerfMarkerAMD Library clbeginperfmarkeramd(), clendperfmarkeramd() CodeXLHelp.chm Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 15

5. CODEXL INTERNALS Little information available Gathers data from OpenCL API run-time Uses GPU Perf API (AMD) Provides derived counters based on raw Hardware performance counters Wavefronts, ALUStalledByLDS, ALUUtilization, API uses a Sampling approach. Needs Handle to current graphic context (OpenGL context/directx context) or Handle to an OpenCL command queue Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 16

5. CODEXL INTERNALS Static/Dynamic binary instrumentation for HW performance counters and OpenCL API run-time? Educated guess: Not at the application level, but Instrumentation at the GPU driver library level Drivers provide callbacks for routines and capture measurements Possible Methods: Synchronous method Event queue method Callback method Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 17

5. CODEXL INTERNALS Synchronous Method: Instrumentation around GPU API calls Implementation: wrap (synchronous) library with performance tool Modified slides from TAU GPU Performance Measurement Tutorial Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 18

5. CODEXL INTERNALS Event queue method: Utilize OpenCL event support clgeteventprofilinginfo Instrumentation to create and insert events Implementation: driver library wrapping Modified slides from TAU GPU Performance Measurement Tutorial Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 19

5. CODEXL INTERNALS Callback method: Utilize language-level callback support clseteventcallback Implementation: Instrumentation to register callbacks Modified slides from TAU GPU Performance Measurement Tutorial Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 20

5. CODEXL PROFILING Application Trace OpenCL API Calls Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 21

6. CODEXL PROFILING Summary Pages: Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 22

6. CODEXL PROFILING Summary Pages: Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 23

6. CODEXL PROFILING Summary Pages: Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 24

6. CODEXL PROFILING Summary Pages: Context Summary Page Top 10 Data Transfer Summary Page Top 10 Kernel Summary Page Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 25

6. CODEXL PROFILING Shows utilization of a Compute Unit Measured by number of in-flight wavefronts for a given Kernel, relative to the maximum number of wavefronts given an ideal Kernel dispatch configuration Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 26

6. CODEXL PROFILING HW Performance Counters: Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 27

7. CODEXL DEBUGGING OpenCL and OpenGL objects Shared contexts Shader and Kernel resources Ability to show buffer contents Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 28

7. CODEXL DEBUGGING Kernel code breakpoints Stepping through one Kernel instance Switching between Kernel instances Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 29

7. CODEXL DEBUGGING Multi-Watch View Choose variable to inspect Variable across all work items Visualization of the buffer CodeXLHelp.chm Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 30

7. OPENCL DEBUGGING Static Kernel analyzer Allows to compile, to analyze and to disassemble OpenCL Kernel code for multiple device versions (also DirectCompute Kernels) Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 31

SUBJECTIVE EVALUATION Application trace provides useful information about concurrent activities in the program Best Practices as unnecessary API calls, Kernel debugging Multi-View to detect errors in bound checks, Stepping through a Kernel took too long on my test system Lack of insights in documentation Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 32

8. SOURCES OpenCL Programming Guide (Addison Wesley 2012) CodeXL User Guide Mathematics for 3D Game Programming and Computer Graphics (Course Technology PTR 3rd Edition 2012) http://developer.amd.com/tools-and-sdks/heterogeneouscomputing/codexl/ http://developer.amd.com/tools-and-sdks/graphicsdevelopment/gpuperfapi/ http://www.amd.com/la/documents/gcn_architecture_whitepaper.pdf http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/10-taugpu-tutorial-part1.pdf http://www.nvidia.com/content/nvision2008/tech_presentations/professio nal_visualization/nvision08-advanced_opengl_debugger.pdf Software Profiling AMD CodeXL Hannes Würfel 6/10/2013 33