Introduction to the Direct3D 11 Graphics Pipeline



Similar documents
Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

GPU Architecture. Michael Doggett ATI

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Radeon HD 2900 and Geometry Generation. Michael Doggett

Optimizing AAA Games for Mobile Platforms

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

NVIDIA GeForce GTX 580 GPU Datasheet

Computer Graphics Hardware An Overview

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Introduction to Computer Graphics

Autodesk 3ds Max 2010 Boot Camp FAQ

L20: GPU Architecture and Models

Developer Tools. Tim Purcell NVIDIA

OpenGL Performance Tuning

A Short Introduction to Computer Graphics

Advanced Visual Effects with Direct3D

Introduction to GPGPU. Tiziano Diamanti

QCD as a Video Game?

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Dynamic Resolution Rendering

Using Photorealistic RenderMan for High-Quality Direct Volume Rendering

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Lecture Notes, CEng 477

INTRODUCTION TO RENDERING TECHNIQUES

GPGPU Computing. Yong Cao

Grid Computing for Artificial Intelligence

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Choosing a Computer for Running SLX, P3D, and P5

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Optimization for DirectX9 Graphics. Ashu Rege

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

SAPPHIRE HD GB GDDR5 PCIE.

The Future Of Animation Is Games

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Low power GPUs a view from the industry. Edvard Sørgård

Intel s Next Generation Integrated Graphics Architecture Intel Graphics Media Accelerator X3000 and 3000

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Image Synthesis. Transparency. computer graphics & visualization

Web Based 3D Visualization for COMSOL Multiphysics

Plug-in Software Developer Kit (SDK)

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Hardware design for ray tracing

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Writing Applications for the GPU Using the RapidMind Development Platform

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

Midgard GPU Architecture. October 2014

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

How To Teach Computer Graphics

Water Flow in. Alex Vlachos, Valve July 28, 2010

GPU Christmas Tree Rendering. Evan Hart

Windows Presentation Foundation: What, Why and When

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

VA (Video Acceleration) API. Jonathan Bian 2009 Linux Plumbers Conference

A Crash Course on Programmable Graphics Hardware

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

Programming 3D Applications with HTML5 and WebGL

SkillsUSA 2014 Contest Projects 3-D Visualization and Animation

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Touchstone -A Fresh Approach to Multimedia for the PC

Image Synthesis. Fur Rendering. computer graphics & visualization

Intel Graphics Media Accelerator 900

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

3D Computer Games History and Technology

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Interactive Level-Set Deformation On the GPU

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

About the Render Gallery

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Interactive Level-Set Segmentation on the GPU

Intel 845G/GL Chipset Dynamic Video Memory Technology

Making natural looking Volumetric Clouds In Blender 2.48a

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. M.Sc. in Advanced Computer Science. Friday 18 th January 2008.

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Performance Optimization Guide

Transcription:

Introduction to the Direct3D 11 Graphics Pipeline Kevin Gee - XNA Developer Connection Microsoft Corporation 2008 NVIDIA Corporation.

Key Takeaways Direct3D 11 focuses on Increasing scalability, Improving the development experience, Extending the reach of the GPU, Improving Performance. Direct3D 11 is a strict superset of D3D 10 & 10.1 Adds support for new features Start developing on Direct3D 10/10.1 today Available on Windows Vista & future Windows operating systems Supports 10 / 10.1 level hardware

Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Character Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches ) Sub-D Modeling Animation Displacement Map Polygon Mesh Generate LODs

Character Authoring (Cont d) Trends Denser meshes, more detailed characters ~5K triangles -> 30-100K triangles Complex animations Result Animations on polygon mesh vertices more costly Integration in authoring pipeline painful Larger memory footprints causing painful I/O issues Solution Use the higher-level surface representation longer Animate control cage (~5K vertices) Generate displacement & normal maps

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Direct3D 10 pipeline Plus Three new stages for Tessellation Domain Shader Geometry Shader Stream Output Rasterizer Pixel Shader Output Merger

Hull Shader (HS) HS input: patch control pts Hull Shader One Hull Shader invocation per patch HS output: Patch control pts after Basis conversion HS output: TessFactors (how much to tessellate) fixed tessellator mode declarations Tessellator Domain Shader

Fixed-Function Tessellator (TS) Note: Tessellator does not see control points Hull Shader TS input: TessFactors (how much to tessellate) fixed tessellator mode declarations Tessellator Tessellator operates per patch TS output: U V {W} domain points Domain Shader TS output: topology (to primitive assembly)

Domain Shader (DS) Hull Shader DS input: control points TessFactors Tessellator DS input: U V {W} domain points Domain Shader One Domain Shader invocation per point from Tessellator DS output: one vertex

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Pixel Shader Output Merger Stream Output D3D11 HW Feature D3D11 Only Fundamental primitive is patch (not triangle) Superset of Xbox 360 tessellation

Example Surface Processing Pipeline Single-pass process! vertex shader hull shader tessellator domain shader Animate/skin Control Points Transform basis, Determine how much to tessellate Tess Factors Tessellate! Evaluate surface including displacement patch control points transformed control points control points in Bezier patch U V {W} domain points displacement map Sub-D Patch Bezier Patch

New Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches ) Sub-D Modeling Animation Displacement Map Optimally Tessellated Mesh GPU

Provides Tessellation: Summary Smooth silhouettes Richer animations for less Scale visual quality across hardware configurations Supports performance improvements Coarse model = compression, faster I/0 to GPU Cheaper skinning and simulation Improve pixel shader quad utilization Scalable rendering for each end user s hardware Render content as artists intend it!

Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

GPGPU & Data Parallel Computing GPU performance continues to grow Many applications scale well to massive parallelism without tricky code changes Direct3D is the API for talking to GPU How do we expand Direct3D to GPGPU?

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Pixel Shader Output Merger Stream Output Direct3D 10 pipeline Plus Three new stages for Tessellation Data Structure Plus Compute Shader Compute Shader

Integration with Direct3D Fully supports all Direct3D resources Targets graphics/media data types Evolution of DirectX HLSL Graphics pipeline updated to emit general data structures which can then be manipulated by compute shader And then rendered by Direct3D again

Example Scenario Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Render scene Write out scene image Use Compute for image post-processing Output final image Geometry Shader Stream Output Rasterizer Pixel Shader Output Merger Data Structure Compute Shader

Target Applications Image/Post processing: Image Reduction Image Histogram Image Convolution Image FFT A-Buffer/OIT Ray-tracing, radiosity, etc. Physics AI

Compute Shader: Summary Enables much more general algorithms Transparent parallel processing model Full cross-vendor support Broadest possible installed base

Overview Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

D3D11 Multithreading Goals Asynchronous resource loading Upload resources, create shaders, create state objects in parallel Concurrent with rendering Multithreaded draw & state submission Spread out render work across many threads Limited support for per-object display lists

Devices and Contexts D3D device functionality now split into three separate interfaces Device, Immediate Context, Deferred Context Device has free threaded resource creation Immediate Context is your single primary device for state, draws, and queries Deferred Contexts are your per-thread devices for state & draws

D3D11 Interfaces Render Thread Load Thread 1 Load Thread 2 Immediate Context DrawPrim DrawPrim DrawPrim DrawPrim Device CreateTexture CreateShader CreateVB CreateTexture CreateShader CreateShader CreateIB CreateVB DrawPrim

Async Resources Use the Device interface for resource creation All functions are free threaded Uses fine-grained sync primitives Resource upload and shader compilation can happen concurrently

State & Draw Submission First priority: multithreaded submission Single-use display lists Lower priority: per-object display lists Multiple reuse D3D11 display lists are immutable

D3D11 Interfaces Immediate Context DrawPrim DrawPrim DrawPrim Execute Execute Deferred Context Display List DrawPrim DrawPrim DrawPrim Deferred Context Display List DrawPrim DrawPrim DrawPrim

Deferred Contexts Can create many deferred contexts Each one is single threaded (thread unsafe) Deferred context generates a Display List Display List is consumed by Immediate or Deferred contexts No read-backs or downloads from the GPU Queries Resource locking Lock with DISCARD is supported on deferred contexts

D3D11 on D3D10 H/W Deferred contexts are implemented at an API-capture level Async resource creation uses coarse sync primitives No longer free threaded; thread safe though D3D10 drivers can be updated to better support D3D11 features Will work on Windows Vista as well as future Windows releases

Overview Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Shader Issues Today Shaders getting bigger, more complex Shaders need to target wide range of hardware Optimization of different shader configurations drives shader specialization

Options: Über-shader Über-shader foo ( ) { if (m == 1) { // do material 1 } else if (m == 2) { // do material 2 } if (l == 1) { // do light model 1 } else if (l == 2) { // do light model 2 } }

Options: Über-shader One Shader to Rule them All Good: All functionality in one place Reduces state changes at runtime One compile step Seems to be most popular coding method Bad: Complex, unorganized shaders Register usage is always worst case path

Options: Specialization Multiple specialized shaders for each combination of settings Good: Always optimal register usage Easier to target optimizations Bad: Huge number of resulting shaders Pain to manage at runtime

Combinatorial Explosion Number of Lights Number of Materials

Solution: Dynamic Shader Linkage & OOP Introducing new OOP features to HLSL Interfaces Classes Can be used for static code Also used as the mechanism for linking specific functionality at runtime

Interfaces interface Light { float3 GetDirection(float3 eye); }; float3 GetColor();

Classes class DirectionalLight : Light { float3 GetDirection(float3 eye) { return m_direction; } float3 GetColor() { return m_color; } }; float3 m_direction; float3 m_color;

Dynamic Shader Linkage Über-shader foo ( ) { if (m == 1) { // do material 1 } else if (m == 2) { // do material 2 } if (l == 1) { // do light model 1 } else if (l == 2) { // do light model 2 } } Dynamic Subroutine Material1( ) { } Material2( ) { } Light1( ) { } Light2( ) { } foo( ) { mymaterial.evaluate( ); mylight.evaluate( ); }

In the Runtime Select specific class instances you want Runtime will inline class methods Equivalent register usage to a specialized shader Inlining is done in the native assembly Fast operation Applies to all subsequent Draw( ) calls

Overview Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Why New Texture Formats? Existing block palette interpolations too simple Results often rife with blocking artifacts No high dynamic range (HDR) support NB: All are issues we heard from developers

Two New BC s for Direct3D11 BC6 (aka BC6H) High dynamic range 6:1 compression (16 bpc RGB) Targeting high (not lossless) visual quality BC7 LDR with alpha 3:1 compression for RGB or 4:1 for RGBA High visual quality

New BC s: Compression Block compression (unchanged) Each block independent Fixed compression ratios Multiple block types (new) Tailored to different types of content Smooth gradients vs. noisy normal maps Varied alpha vs. constant alpha Also new: decompression results must be bit-accurate with spec

Multiple Block Types Different numbers of color interpolation lines Less variance in one block means: 1 color line Higher-precision endpoints More variance in one block means: 2 (BC6 & 7) or 3 (BC7 only) color lines Lower-precision endpoints and interpolation bits Different numbers of index bits 2 or 3 bits to express position on color line Alpha Some blocks have implied 1.0 alpha Others encode alpha

Partitions When using multiple color lines, each pixel needs to be associated with a color line Individual bits to choose is expensive For a 4x4 block with 2 color lines 2 16 possible partition patterns 16 to 64 well-chosen partition patterns give a good approximation of the full set BC6H: 32 partitions BC7: 64 partitions, shares first 32 with BC6H

Example Partition Table A 32-partition table for 2 color lines

Comparisons Orig BC3 Orig BC7 Abs Error

Comparisons Orig BC3 Orig BC7 Abs Error

Comparisons HDR Original at given exposure Abs Error BC6 at given exposure

Overview Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Lots of Other Features Addressable Stream Out Draw Indirect Pull-model attribute eval Improved Gather4 Min-LOD texture clamps 16K texture limits Required 8-bit subtexel, submip filtering precision Conservative odepth 2 GB Resources Geometry shader instance programming model Optional double support Read-only depth or stencil views

Overview Drilldown Outline Tessellation Compute Shader Multithreading Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Questions?

2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.