Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group



Similar documents
Optimization for DirectX9 Graphics. Ashu Rege

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Computer Graphics Hardware An Overview

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU Architecture. Michael Doggett ATI

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

OpenGL Performance Tuning

Radeon HD 2900 and Geometry Generation. Michael Doggett

Optimizing AAA Games for Mobile Platforms

Introduction to GPGPU. Tiziano Diamanti

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Advanced Visual Effects with Direct3D

Introduction to Computer Graphics

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Programming 3D Applications with HTML5 and WebGL

INTRODUCTION TO RENDERING TECHNIQUES

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Path Tracing Overview

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

3D Computer Games History and Technology

GPGPU Computing. Yong Cao

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Deferred Shading & Screen Space Effects

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Plug-in Software Developer Kit (SDK)

Hardware design for ray tracing

Beginning Android 4. Games Development. Mario Zechner. Robert Green

GPU Christmas Tree Rendering. Evan Hart

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

NVIDIA GeForce GTX 580 GPU Datasheet

Dynamic Resolution Rendering

Writing Applications for the GPU Using the RapidMind Development Platform

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Water Flow in. Alex Vlachos, Valve July 28, 2010

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Touchstone -A Fresh Approach to Multimedia for the PC

L20: GPU Architecture and Models

Color correction in 3D environments Nicholas Blackhawk

GPGPU: General-Purpose Computation on GPUs

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

A Crash Course on Programmable Graphics Hardware

GPU Shading and Rendering: Introduction & Graphics Hardware

Low power GPUs a view from the industry. Edvard Sørgård

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

Image Synthesis. Fur Rendering. computer graphics & visualization

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

New NVIDIA OpenGL Extensions. Simon Green NVIDIA Developer Technology

Lezione 4: Grafica 3D*(II)

Image Synthesis. Transparency. computer graphics & visualization

The Future Of Animation Is Games

Volume visualization I Elvins

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Volume Rendering on Mobile Devices. Mika Pesonen

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

Procedural Shaders: A Feature Animation Perspective

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

High Dynamic Range and other Fun Shader Tricks. Simon Green

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Computer Graphics on Mobile Devices VL SS ECTS

1. INTRODUCTION Graphics 2

Blender Notes. Introduction to Digital Modelling and Animation in Design Blender Tutorial - week 9 The Game Engine

Introduction Computer stuff Pixels Line Drawing. Video Game World 2D 3D Puzzle Characters Camera Time steps

Far Cry and DirectX. Carsten Wenzel

Introduction to Game Programming. Steven Osman

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

A Short Introduction to Computer Graphics

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Advanced Graphics and Animations for ios Apps

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

Introduction to Computer Graphics

GPU Point List Generation through Histogram Pyramids

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

How To Teach Computer Graphics

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

Next Generation GPU Architecture Code-named Fermi

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

CS 378: Computer Game Technology

Using Photorealistic RenderMan for High-Quality Direct Volume Rendering

Transcription:

Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group

Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control Vertex Frequency Stream Divider (Instancing) New Pixel Shader Features Longer Programs and Dynamic Flow Control Multiple Render Targets FloatingPoint Blending and Filtering Final Thoughts

GeForce 6 Series Shader Model 3.0 at all price points Full support for shader model 3.0 Vertex Texture Fetch / Long programs / Pixel Shader flow control Full speed fp32 shading OpenEXR High Dynamic Range Rendering Floating point frame buffer blending Floating point texture filtering Except 6200 6800 Ultra/GT specs 222M xtors / 0.13um 6 vertex units / 16 pixel pipelines PCI Express and AGP

GeForce 6800 (NV40)

Complete Native Shader Model 3.0 Support Vertex Shader Model Vertex Shader Instructions Displacement Mapping Vertex Texture Fetch Geometry Instancing Dynamic Flow Control Pixel Shader Model Required Shader Precision Pixel Shader Instructions Subroutines Loops & Branches Dynamic Flow Control Shader Model 2.0 2.0 256 2.0 fp24 96 Shader Model 3.0 2 16 2 16 3.0 16 (65,535) 3.0 fp32 16 (65,535)

Vertex Shader 3.0

Detail of a Single Vertex Shader Pipeline Input Vertex Data Vertex Texture Fetch FP32 Scalar Unit FP32 Vector Unit Branch Unit Texture Cache Primitive Assembly Viewport Processing To Setup

Vertex Shader Version Summary 2.0 2.0a 3.0 # of instruction slots 256 256 >= 512 Max # of instructions executed 65535 65535 2 16 16 (65,535) Instruction Predication Temp Registers 12 13 32 # constant registers >= 256 >= 256 >= 256 Static Flow Control Dynamic Flow Control Dynamic Flow Control depth 24 24 Vertex Texture Fetch # of texture samplers 4 Geometry Instancing Support Note: Note: There There is is no no vertex vertex shader shader 2.0b 2.0b

Flow Control: Static vs. Dynamic void Shader(... // Input per vertex or per pixel in float3 normal, // Input per batch of triangles uniform float3 lightdirection, uniform bool computelight, Static Flow Control (condition constant for each batch of triangles) Dynamic Flow Control (data dependent, so condition can vary per vertex or pixel) ) { }...... if (computelight) {... if (dot(lightdirection, normal)) {... }... }...

Static v. Dynamic Flow Control Static Flow Control Based on uniform variables, a.k.a. constants Same code executed for every vertex in draw call Dynamic Flow Control Based on pervertex attributes Each vertex can take a different code path

Using Flow Control Subroutines, loops, and conditionals simplify programming [if, else, endif] [loop, endloop] [rep, endrep] call, callnz, ret Conditionals can be nested Fewer vertex shaders to manage Dynamic branches only have ~2 cycle overhead Even if vertices take different branches Use this to avoid unnecessary vertex work (e.g., skinning, N.L<0,...) If you can branch to skip more than 2 cycles of work, do it!

Geometry Instancing

DirectX 9 Instancing What is instancing? Allows a single draw call to draw multiple instances of the same model Allows you to minimize draw primitive calls and reduce CPU overhead What is required to use it? Microsoft DirectX 9.0c VS 3.0 hardware API is layered on top of IDirect3DDevice9::SetStreamSourceFreq

Why Use Instancing? Speed Single biggest perf sink is # of draw calls We all know draw calls are bad But world matrices and other state changes force us to make multiple draw calls Instancing API pushes per instance draws down to hardware/driver Eliminates API and driver overhead

How does it work? Stream 0 Stream 1 Vertex Data Per instance data VS_3_0 Primary stream is a single copy of the model geometry Secondary stream(s) contain perinstance data Transform matrices, colors, texture indices Vertex shader does matrix transformations based on vertex attributes Pointer is advanced each time an instance of the primary stream is rendered.

Instancing Demo Space scene with 500+ ships, 4000+ rocks Complex lighting, postprocessing Some simple CPU collision work as well Dramatically faster with instancing

Some Test Results Test scene draws 1 million diffuse shaded polygons Changing the batch size changes # of drawn instances For small batch sizes, can provide extreme win due to PER DRAW CALL savings There is a fixed overhead from adding the extra data into the vertex stream Sweet spot depends on many factors (CPU/GPU speed, engine overhead, etc.) Instancing versus Single Daw Calls Instancing Frame Rate No Instancing Batch Size

When To Use Instancing Many instances of the same model Forest of trees, particle systems, sprites Can encode per instance data in aux stream Colors, texture coordinates, perinstance constants Not as useful is batching overhead is low Fixed overhead to instancing

Vertex Texture Fetch

An Example of Vertex Texturing: Displacement Mapping Displacement Texture Flat Tessellated Mesh Displaced Mesh

Vertex Texture Examples Without Vertex Textures With Vertex Textures Images used with permission from Pacific Fighters. 2004 Developed by 1C:Maddox Games. All rights reserved. 2004 Ubi Soft Entertainment.

More Vertex Texture Examples Without Vertex Textures With Vertex Textures Images used with permission from Pacific Fighters. 2004 Developed by 1C:Maddox Games. All rights reserved. 2004 Ubi Soft Entertainment.

Vertex Texture Multiple vertex texture units DX9: 4 samplers (D3DVERTEXTEXTURESAMPLERn) OGL: glgetintegerv(max_vertex_texture_image_units_arb) 4 units on GeForce 6 Series hardware Supports point filtering only (currently) Supports mipmapping Need to calculate LOD yourself Uses standard 2D texture samplers DX9: R32F and R32G32B32A32F formats OGL: LUMINANCE_FLOAT32_ATI or RGBA_FLOAT32_ATI formats Arbitrary number of fetches

Vertex Texture Applications Simple displacement mapping Note not adaptive displacement mapping Hardware doesn t tessellate for you Terrain, ocean surfaces Render to vertex texture Provides feedback path from fragment program to vertex program Particle systems Calculate particle positions using fragment program, read positions from texture in vertex program, render as points Character animation Can do arbitrarily complex character animation using fragment programs, read final result as vertex texture Not limited by vertex attributes can use lots of bones, lots of blend shapes

GPU Particle System

Pixel Shader 3.0

Pixel Shader Version Summary 2.0 2.0a 2.0b 3.0 Dependent Texture Limit 4 No limit 4 No limit Texture Instruction Limit 32 unlimited unlimited unlimited Position Register Instruction Slots 32 + 64 512 512 >= 512 Executed Instructions 32 + 64 512 512 2 16 (65,535) Interpolated Registers 2 + 8 2 + 8 2 + 8 10 Instruction Predication Indexed Input Registers Temp Registers 12 22 32 32 Constant Registers 32 32 32 224 Arbitrary Swizzling Gradient Instructions Loop Count Register Face Register (2sided lighting) Dynamic Flow Control Depth 24

PS3.0 Branching Performance Static branching is fast But still may not be worth it for short branches (less than ~5 instructions) Can use conditional execution instead Divergent (datadependent) branching is more expensive Depends on which pixels take which branches

Branch Overhead Pixel shader flow control instruction costs: Instruction if / endif if / else / endif call ret loop / endloop Cost (Cycles) 4 6 2 2 4 Not free, but certainly usable and can save a ton of work!

Multiple Lights Demo Available at http://developer.nvidia.com/object/sdk_samples.html

Pixel Shader Ray Tracer Available at http://developer.nvidia.com/object/sdk_effects.html

Pixel Shader Looping Example Single Pass Volume Rendering Application only renders a single quad Pixel shader calculates intersection between view ray and bounding box, discards pixels outside Marches along ray between far and near intersection points, accumulating color and opacity Looks up in 3D texture, or evaluates procedural function at each sample Compiles to REP/ENDREP loop Allows us to exceed the 512 instruction PS2.0 limit All blending is done at fp32 precision in the shader 100 steps is interactive on 6800 Ultra

1 Pass Volume Rendering Examples

Extra Full Precision Interpolators 10 full precision interpolators (texcoords) Compared to 8 in earlier pixel shader versions More inputs for lighting parameters,... Multiple lights in one long shader Compared to rerendering for each light Doesn t work well with stencil shadows

Early Outs Early out is a dynamic branch in the shader to bypass computation Some obvious examples: If in shadow, don t do lighting computations If out of range (attenuation zero), don t light These apply to vs.3.0 as well Next a novel example for softedged shadows

SoftEdged Shadows with ps 3.0 Available at http://developer.nvidia.com/object/sdk_samples.html

SoftEdged Shadows with ps 3.0 Works by taking 8 test samples from shadow map If all 8 in shadow or all 8 in the light we re done If we re on the edge (some are in shadow some are in light), do 56 more samples for additional quality 64 samples at much lower cost! Quickanddirty adaptive sampling

ps.3.0 Soft Shadows This demo on GeForce 6 Series GPUs Dynamic sampling > 2x faster vs. 64 samples everywhere Completely orthogonal to other parts of the HW (for example, stencil is still usable) Can do even more complex decisionmaking if necessary Combine with hardware shadow maps Highquality realtime soft shadows are a reality

Summary Shader Model 3.0 provides a nice collection of useful features Looping/branching/conditional constructs allow greater programming flexibility Must watch out for performance gotchas Don t make everything a nail for the SM3.0 hammer

References Tons of resources at http://developer.nvidia.com NVIDIA SDK http://developer.nvidia.com/object/sdk_home.html Individual Standalone Samples (.zip) http://developer.nvidia.com/object/sdk_samples.html Individual FX Composer Effects (.fx) http://developer.nvidia.com/object/sdk_effects.html Documentation NVIDIA GPU Programming Guide http://developer.nvidia.com/object/gpu_programming_guide.html Recent Conference Presentations http://developer.nvidia.com/object/presentations.html

Questions? Support email: devrelfeedback@nvidia.com [Technical Questions] sdkfeedback@nvidia.com [Tools Questions]