DirectX 11 Compute Shader. Chas. Boyd Architect Windows Desktop and Graphics Technology Microsoft

Similar documents
Entering the Golden Age of Heterogeneous Computing

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

L20: GPU Architecture and Models

Radeon HD 2900 and Geometry Generation. Michael Doggett

Introduction to GPU Programming Languages

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

GPU Architecture. Michael Doggett ATI

The Future Of Animation Is Games

Computer Graphics Hardware An Overview

Introduction to GPGPU. Tiziano Diamanti

Writing Applications for the GPU Using the RapidMind Development Platform

GPGPU Computing. Yong Cao

NVIDIA GeForce GTX 580 GPU Datasheet

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Next Generation GPU Architecture Code-named Fermi

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Introduction to GPU hardware and to CUDA

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Developer Tools. Tim Purcell NVIDIA

GPGPU accelerated Computational Fluid Dynamics

Graphic Processing Units: a possible answer to High Performance Computing?

Choosing a Computer for Running SLX, P3D, and P5

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

QCD as a Video Game?

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

System requirements for Autodesk Building Design Suite 2017

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

How To Teach Computer Graphics

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

Evaluation of CUDA Fortran for the CFD code Strukti

Stream Processing on GPUs Using Distributed Multimedia Middleware

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Introduction to Computer Graphics

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

GPUs for Scientific Computing

GPU Computing - CUDA

Lecture Notes, CEng 477

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

1. INTRODUCTION Graphics 2

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

HPC with Multicore and GPUs

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Hardware design for ray tracing

Low power GPUs a view from the industry. Edvard Sørgård

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

SAPPHIRE HD GB GDDR5 PCIE.

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006

Optimization for DirectX9 Graphics. Ashu Rege

Part I Courses Syllabus

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS UPDATE

Texture Cache Approximation on GPUs

GPGPU acceleration in OpenFOAM

OpenGL Performance Tuning

Crosswalk: build world class hybrid mobile apps

GPU Parallel Computing Architecture and CUDA Programming Model

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

Introduction to GPU Computing

Web Based 3D Visualization for COMSOL Multiphysics

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE

Dynamic Resolution Rendering

Autodesk 3ds Max 2010 Boot Camp FAQ

Accelerating CFD using OpenFOAM with GPUs

INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

High Performance GPGPU Computer for Embedded Systems

Introduction to GPU Architecture

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

Autodesk Revit 2016 Product Line System Requirements and Recommendations

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY

INTRODUCTION TO WINDOWS 7

ST810 Advanced Computing

A Powerful solution for next generation Pcs

GPGPU: General-Purpose Computation on GPUs

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

HP Workstations graphics card options

Optimizing AAA Games for Mobile Platforms

Transcription:

DirectX 11 Compute Shader Chas. Boyd Architect Windows Desktop and Graphics Technology Microsoft

Outline Compute Shader Objectives DirectX Data-Parallel Processing Target Applications Design Compute Shader Details Syntax and Features

DirectX DirectX API set shipping since 1995 Direct3D is popular graphics API for PCs Historically for games, but broadening in scope Windows OS components, media apps, etc. GPU performance has grown at faster rate than CPU graphic performance has New customers want that performance

Data Parallel Processing A programming model and hardware architecture Assign processor resources on a per-dataelement basis Scales very well with core-count growth Applications written in DirectX3 for 1 ALU still run on 800 core processors

Introducing: the Compute Shader A new processing model for GPUs Data parallel programming for mass market client apps Integrated with Direct3D For efficient inter-op with graphics in client scenarios Supports more general constructs than before Cross thread data sharing Un-ordered access I/O operations Enables more general data structures Irregular arrays, trees, etc. Enables more general algorithms Far beyond shading

Target: Interactive Graphics/Games Image/Post processing: Image Reduction, Histogram, Convolution, FFT Effect physics Particles, smoke, water, cloth, etc. Advanced renderers: A-Buffer/OIT, Reyes, Ray-tracing, radiosity, etc. Gameplay physics, AI, etc. Production pipelines

Target: Media Processing Video: Transcode, superresolution, etc. Photo/imaging: Consumer applications Non-client scenarios: HPC, server workloads, etc.

Optimized for Client Scenarios Simpler setup syntax Balance between power and complexity Real-time rendering of results Working to reduce cost of transition from compute mode to graphics mode Better integration with media data types: Pixels, samples, text, vs only floats Need consistency between implementations Both across vendors and over time/generations

Component Relationships Applications Media playback or processing, media UI, recognition, etc. Domain Libraries Domain Languages Accelerator, Brook+, Rapidmind, Ct MKL, ACML, cufft, D3DX, etc. Compute Languages DirectX11 Compute, CUDA, CAL, OpenCL, LRB Native, etc. Processors CPU, GPU, Larrabee nvidia, Intel, AMD, S3, etc.

1 Compute Shader Features Predictable Thread Invocation Regular arrays of threads: 1-D, 2-D, 3-D Don t have to draw a quad anymore Shared registers between threads Reduces register pressure Can eliminate redundant compute and i/o Scattered Writes Can read/write arbitrary data structures Enables new classes of algorithms Integrates with Direct3D resources

Integrated with Direct3D Fully supports all Direct3D resources Targets graphics/media data types Evolution of DirectX HLSL Graphics pipeline updated to emit general data structures via addressable writes Which can then be manipulated by compute shader And then rendered by Direct3D again

Integration with Graphics Pipeline Input Assembler Vertex Shader Tessellation Geometry Shader Rasterizer Pixel Shader Output Merger Render scene Write out scene image Use Compute for image post-processing Output final image Data Structure Scene Image Compute Shader Final Image

Pixel Shader Programming Model For imaging or GPGPU Millions of threads Each can only write to it s own destination No write contention No inter-thread communication Pure data-parallel model

Compute Shader Programming 1000s of thread groups Registers shareable within each group Arbitrary access writes to video memory

Memory Objects DXGI Resources Used for textures, images, vertices, hulls, etc. Enables out-of-bounds memory checking Returns 0 on reads Writes are No-Ops Improves security, reliability of shipped code Exposed as HLSL Resource Variables Declared in the language as data objects

Optimized I/O Intrinsics Textures & Buffers RWTexture2D, RWBuffer Act just like existing types Structured I/O RWStructuredBuffer StructuredBuffer (read-only) Template type can be any struct definition Fast Structured I/O AppendStructuredBuffer, ConsumeStructuredBuffer Work like streams Do not preserve ordering

Atomic Operator Intrinsics Enable basic operations w/o lock/contention: InterlockedAdd( rvar, val ); InterlockedMin( rvar, val ); InterlockedMax( rvar, val ); InterlockedOr( rvar, val ); InterlockedXOr( rvar, val ); InterlockedCompareWrite( rvar, val ); InterlockedCompareExchange( rvar, val );

Texture Sampling All 1-D, 2-D, 3-D and cube map resource topologies

Texture Sampling Operations All DirectX11 texture formats Including new compressed HDR format Sizes extended to 2GB, 16k x 16k, Standard HLSL sampling intrinsics Sample() Load() Gather()

More DirectX11 Language Features SIMD-optimized method support Facilitates SIMD version of OOP Minimizes register utilization of method instances Enables combinatoric shaders to be specialized Arbitrarily addressable writes in Pixel Shader Optional double precision New double and long types

DirectX 11 Foundation Support for runtime compilation Very nice during prototyping and development Support for runtime data binding Consequence of above Compiler provided for off-line use as well

Reduction Compute Code Buffer<uint> Values; OutputBuffer<uint> Result; ImageAverage() { groupshared uint Total; groupshared uint Count; // Total so far // Count added float3 vpixel = load( sampler, sv_threadid ); float fluminance = dot( vpixel, LUM_VECTOR ); uint value = fluminance*65536; InterlockedAdd( Count, 1 ); InterlockedAdd( Total, value ); GroupMemoryBarrier(); // Let all threads in group complete

FFT Performance Evolution 160 140 120 100 80 60 40 20 3x 6x 12x 0 CPU Direct3D9 DirectX11 CS New Hardware New Algorithm

Additional Algorithms New rendering methods Ray-tracing, collision detection, etc. Rendering elements at different resolutions Non-rendering algorithms IK, Physics, AI, simulation, fluid simulation, radiosity More general data structures Quad/octrees, irregular arrays, sparse arrays Linear Algebra

Summary DirectX 11 Compute Shader delivers the performance of 3-D games to new applications Demonstrates tight integration between computation and rendering Supported by all processor vendors Scalable parallel processing model Code should scale for several generations