Far Cry and DirectX. Carsten Wenzel

Similar documents
GPU Architecture. Michael Doggett ATI

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Optimizing AAA Games for Mobile Platforms

Advanced Visual Effects with Direct3D

OpenGL Performance Tuning

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Dynamic Resolution Rendering

Writing Applications for the GPU Using the RapidMind Development Platform

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Advanced Rendering for Engineering & Styling

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Radeon HD 2900 and Geometry Generation. Michael Doggett

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Advanced Graphics and Animations for ios Apps

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

GPU Christmas Tree Rendering. Evan Hart

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

3D Computer Games History and Technology

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

Making natural looking Volumetric Clouds In Blender 2.48a

Developer Tools. Tim Purcell NVIDIA

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

Image Synthesis. Fur Rendering. computer graphics & visualization

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Optimization for DirectX9 Graphics. Ashu Rege

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Broken Shard. Alpha Report. Benjamin Schagerl, Dominik Dechamps, Eduard Reger, Markus Wesemann. TUM Computer Games Laboratory

Introduction to Game Programming. Steven Osman

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

The Future Of Animation Is Games

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Computer Graphics Hardware An Overview

Software Virtual Textures

Introduction to GPGPU. Tiziano Diamanti

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

Advanced Diploma of Professional Game Development - Game Art and Animation (10343NAT)

Advances in Real-Time Skin Rendering

INTRODUCTION TO RENDERING TECHNIQUES

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Cork Education and Training Board. Programme Module for. 3 Dimensional Computer Graphics. Leading to. Level 5 FETAC

NVIDIA GeForce GTX 580 GPU Datasheet

3D Stereoscopic Game Development. How to Make Your Game Look

L20: GPU Architecture and Models

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Maxwell Render 1.5 complete list of new and enhanced features

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

SkillsUSA 2014 Contest Projects 3-D Visualization and Animation

Image Synthesis. Transparency. computer graphics & visualization

Interactive Level-Set Deformation On the GPU

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Sparse Fluid Simulation in DirectX. Alex Dunn Dev. Tech. NVIDIA

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Color correction in 3D environments Nicholas Blackhawk

GeomCaches for VFX in Ryse. Sascha Herfort Senior Technical Artist

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Programming 3D Applications with HTML5 and WebGL

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

Beyond 2D Monitor NVIDIA 3D Stereo

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Getting Started with CodeXL

Shading a Bigger, Better Sequel Techniques in Left 4 Dead 2 BRONWEN GRIMES, VALVE

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

ABS 731 Lighting Design & Technology. Spring 2006

Shadows. Shadows. Thanks to: Frédo Durand and Seth Teller MIT. Realism Depth cue

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Touchstone -A Fresh Approach to Multimedia for the PC

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

Maya 2014 Still Life Part 1 Texturing & Lighting

GPU-Driven Rendering Pipelines

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Movie 11. Preparing images for print

Multi-GPU Load Balancing for Simulation and Rendering

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Rendering Microgeometry with Volumetric Precomputed Radiance Transfer

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

A Crash Course on Programmable Graphics Hardware

A Proposal for OpenEXR Color Management

Texture Cache Approximation on GPUs

Hardware design for ray tracing

Transcription:

Far Cry and DirectX Carsten Wenzel

Far Cry uses the latest DX9 features Shader Models 2.x / 3.0 - Except for vertex textures and dynamic flow control Geometry Instancing Floating-point render targets

Dynamic flow control in PS To consolidate multiple lights into one pass, we ideally would want to do something like this float3 finalcol = 0; float3 diffusecol = tex2d( diffusemap, IN.diffuseUV.xy ); float3 normal = mul( IN.tangentToWorldSpace, tex2d( normalmap, IN.bumpUV.xy ).xyz ); for( int i = 0; i < cnumlights; ; i++ ) float3 lightcol = LightColor[ [ i ]; float3 lightvec = normalize( clightpos[ [ i ].xyz IN.pos.xyz ); // // Attenuation, Specular, etc. calculated via if( const_boolean ) // float ndotl = saturate( dot( lightvec.xyz,, normal ) ); final += lightcol.xyz * diffusecol.xyz * ndotl * atten; return( float4( finalcol,, 1 ) );

Dynamic flow control in PS Welcome to the real world Dynamic indexing only allowed on input registers; prevents passing light data via constant registers and index them in a loop Passing light info via input registers not feasible as there are not enough of them (only 10) Dynamic branching is not free

Loop unrolling We chose not to use dynamic branching and loops Used static branching and unrolled loops instead Works well with Far Cry s existing shader framework Shaders are precompiled for different light masks 0-44 dynamic light sources per pass 3 different light types (spot, omni, directional) 2 modification types per light (specular only, occlusion map) Can result in over 160 instructions after loop unrolling when using 4 lights Too long for ps_2_0 Just fine for ps_2_a, ps_2_b and ps_3_0! To avoid run time stalls, use a pre-warmed shader cache

How the shader cache works Specific shader depends on: 1) Material type (e.g. skin, phong,, metal) 2) Material usage flags (e.g. bump-mapped, mapped, specular) 3) Specific environment (e.g. light mask, fog)

How the shader cache works Cache access: Object to render already has shader handles? Use those! Otherwise try to find the shader in memory If that fails load from harddisk If that fails generate VS/PS, store backup on harddisk Finally, save shader handles in object Not the ideal solution but Works reasonably well on existing hardware Was easy to integrate without changing assets For the cache to be efficient All used combinations of a shader should exist as pre-cached files on HD On the fly update causes stalls due to time required for shader compilation! However, maintaining the cache can become cumbersome

Loop unrolling Pros/Cons Pros: Speed! Not branching dynamically saves quite a few cycles At the time, we found switching shaders to be more efficient than dynamic branching Cons: Needs sophisticated shader caching, due to number of shader combinations per light mask (244 after presorting of combinations) Shader pre-compilation takes time Shader cache for Far Cry 1.3 requires about 430 MB (compressed down to ~23 MB in patch exe)

Geometry Instancing Potentially saves cost of n-1 draw calls when rendering n instances of an object Far Cry uses it mainly to speed up vegetation rendering Per instance attributes: Position Size Bending info Rotation (only if needed) Reduce the number of instance attributes! Two methods: Vertex shader constants Use for objects having more than 100 polygons Attribute streams Use for smaller objects (sprites, impostors)

Instance Attributes in VS Constants Best for objects with large numbers of polygons, prevents GPU from becoming attribute bound (see Cem s talk) Put instance data in VS constants and index into additional stream WGF 2.0 will support an automatically generated instance index! Large batches need to be split up to fit attributes in VS constant (try to fit attributes for at least eight instances to amortize startup cost!) Use SetStreamSourceFrequency to setup geometry instancing as follows SetStreamSourceFrequency( geomstream, D3DSTREAMSOURCE_INDEXEDDATA numinstances ); SetStreamSourceFrequency( inststream, D3DSTREAMSOURCE_INSTANCEDATA 1 ); Be sure to reset the vertex stream frequency once you re done, SSSF( strnum,, 1 )!)

VS Snippet to unpack attributes (position & size) from VS constants to create matmvp and transform vertex const float4x4 cmatviewproj; const float4 cpackedinstancedata[ numinstances ]; float4x4 matworld; float4x4 matmvp; int i = IN.InstanceIndex; matworld[ 0 ] = float4( cpackedinstancedata[ i ].w, 0, 0, cpackedinstancedata[ i ].x ); matworld[ 1 ] = float4( 0, cpackedinstancedata[ i ].w, 0, cpackedinstancedata[ i ].y ); matworld[ 2 ] = float4( 0, 0, cpackedinstancedata[ i ].w, cpackedinstancedata[ i ].z ); matworld[ 3 ] = float4( 0, 0, 0, 1 ); matmvp = mul( cmatviewproj, matworld ); OUT.HPosition = mul( matmvp, IN.Position );

Instance Attribute Streams Original geometry instancing approach Only pay the cost for 1 draw call when rendering n instances Best for objects with few polygons, less likely to become attribute bound Put per instance data into additional stream Setup vertex stream frequency as before and reset when you re done

VS Snippet to unpack attributes (position & size) from attribute stream to create matmvp and transform vertex const float4x4 cmatviewproj; float4x4 matworld; float4x4 matmvp; matworld[ 0 ] = float4( IN.PackedInstData.w, 0, 0, IN.PackedInstData.x ); matworld[ 1 ] = float4( 0, IN.PackedInstData.w, 0, IN.PackedInstData.y ); matworld[ 2 ] = float4( 0, 0, IN.PackedInstData.w, IN.PackedInstData.z ); matworld[ 3 ] = float4( 0, 0, 0, 1 ); matmvp = mul( cmatviewproj, matworld ); OUT.HPosition = mul( matmvp, IN.Position );

Geometry Instancing Results Depending on the amount of vegetation, rendering speed increases up to 40% (when heavily draw call limited) Allows us to increase sprite distance ratio, a nice visual improvement with only a moderate rendering speed hit

Scene drawn normally

Batches visualized Vegetation objects tinted the same way get submitted in one draw call!

High Dynamic Range Rendering

High Dynamic Range Rendering Uses A16B16G16R16F render target format Alpha blending and filtering is essential Unified solution for post-processing processing Glare, flares, etc. can be added more naturally

HDR Implementation HDR in Far Cry follows standard approaches Kawase s bloom filters Reinhard s tone mapping operator See DXSDK sample Performance hint For post processing try splitting your color into rg, ba and write them into two MRTs of format G16R16F. That s more cache efficient on some cards.

Bloom from [Kawase03] Repeatedly apply small blur b filters Composite bloom with original image Ideally in HDR space, followed by tone mapping

Increase Filter Size Each Pass 1 st st pass 2 nd nd pass Pixel being Rendered Texture sampling points 3 rd rd pass From [Kawase03]

No HDR

HDR (tone mapped scene + bloom + stars)

[Reinhard02] Tone Mapping 1) Calculate scene luminance On GPU done by sampling the log() values, scaling them down to 1x1 and calculating the exp() 2) Scale to target average luminance α 3) Apply tone mapping operator To simulate light adaptation replace Lum avg in step 2 and 3 by an adapted luminance value which slowly converges towards Lum avg For further information attend Reinhard s session called Tone Reproduction In Interactive Applications this Friday, March 11 at 10:30am

HDR Watch out Currently no FSAA Extremely fill rate hungry Needs support for float buffer blending HDR-aware production 1 : Light maps Skybox 1) For prototyping, we actually modified our light map generator to generate HDR maps and tried HDR skyboxes. They look great. However we didn t include them in the patch because Compressing HDR light map textures is challenging Bandwidth requirements would have been even bigger Far Cry patch size would have been huge No time to adjust and test all levels

Conclusion Dynamic flow control in ps_3_0 Geometry Instancing High Dynamic Range Rendering

References [Kawase03] Masaki Kawase,, Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless( Wreckless), Game Developer s Conference 2003 [Reinhard02] Erik Reinhard,, Michael Stark, Peter Shirley and James Ferwerda, Photographic Tone Reproduction for Digital Images, SIGGRAPH 2002.

Questions? carsten@crytek.de Thanks to Martin Mittring & Andrey Khonich @ Crytek Jason Mitchell & Richard Huddy @ ATI