Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university



Similar documents
GPU Architecture. Michael Doggett ATI

GPGPU Computing. Yong Cao

Computer Graphics Hardware An Overview

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Radeon HD 2900 and Geometry Generation. Michael Doggett

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Introduction to GPGPU. Tiziano Diamanti

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Introduction to GPU Architecture

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

NVIDIA GeForce GTX 580 GPU Datasheet

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

L20: GPU Architecture and Models

Next Generation GPU Architecture Code-named Fermi

Introduction to GPU Programming Languages

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

The Future Of Animation Is Games

Console Architecture. By: Peter Hood & Adelia Wong

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Introduction to Computer Graphics

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

A Crash Course on Programmable Graphics Hardware

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Dynamic Resolution Rendering

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

Real-Time Graphics Architecture

Introduction to GPU hardware and to CUDA

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Texture Cache Approximation on GPUs

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

OpenGL Performance Tuning

Optimizing AAA Games for Mobile Platforms

GPU architecture II: Scheduling the graphics pipeline

GPU Parallel Computing Architecture and CUDA Programming Model

1. INTRODUCTION Graphics 2

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

GPUs for Scientific Computing

Low power GPUs a view from the industry. Edvard Sørgård

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Midgard GPU Architecture. October 2014

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

How To Teach Computer Graphics

SAPPHIRE HD GB GDDR5 PCIE.

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

3D Computer Games History and Technology

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE

Writing Applications for the GPU Using the RapidMind Development Platform

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Introduction to GPU Computing

GPGPU: General-Purpose Computation on GPUs

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

Advanced Visual Effects with Direct3D

ST810 Advanced Computing

An Extremely Inexpensive Multisampling Scheme

QCD as a Video Game?

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY

Optimization for DirectX9 Graphics. Ashu Rege

Image Synthesis. Transparency. computer graphics & visualization

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Lecture 15: Hardware Rendering

GPU for Scientific Computing. -Ali Saleh

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006

Choosing a Computer for Running SLX, P3D, and P5

White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

HP Workstations graphics card options

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Stream Processing on GPUs Using Distributed Multimedia Middleware

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Getting Started with CodeXL

GPU Usage. Requirements

IP Video Rendering Basics

INTRODUCTION TO RENDERING TECHNIQUES

GPU Computing - CUDA

Configuring Memory on the HP Business Desktop dx5150

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Developer Tools. Tim Purcell NVIDIA

CUDA programming on NVIDIA GPUs

NVIDIA GeForce GTX 750 Ti

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Transcription:

Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011

Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook et al., 1987 The Reyes Image Rendering Architecture Image courtesy and copyright Pixar Slide courtesy Jonathan Ragan-Kelley

Outline GPU architecture 2001-2009 ATI/AMD - Boston, U.S.A. XBOX360, Radeon 2xxx-6xxx Decoupled Sampling Analytical Motion Blur

Graphics Group Tomas Akenine-Möller Michael Doggett Lennart Ohlsson Magnus Andersson Rasmus Barringer Per Ganestam Carl Johan Gribel Björn Johnsson Jim Rasmusson Philip Buchanan

Core Core Core Core CPU L2 L2 L2 L2 Shared L3 cache GPU cache cache cache cache cache cache cache cache

128 fragments in parallel 16 cores = 128 ALUs, 16 simultaneous instruction streams Slide courtesy Kayvon Fatahalian 6

vertices/fragments 128 [ primitives ] in parallel OpenCL work items CUDA threads vertices primitives fragments Slide courtesy Kayvon Fatahalian 7

GPU design parameters Competition Currently 2 strong competitors AMD (ATI) and nvidia Performance/Dollar Moore s law Number of transistors on a chip doubles every two years

GPU design parameters RTL design nvidia ALUs from DX10 on are full custom Architecture changes hidden by API Fixed function uses programmable hardware Many custom units Backwards compatible

Before programmable GPUs ATI Founded 1985 started nvidia Rasterization Founded 1993 DirectX 6 Textures Texturing Z & Alpha Image FrameBuffer

Hardware Transform, Clipping and Lighting Transformation requires 32bit float 4x4 matrix multiplication Texturing for 4 component (RGBA) 8bit pixels Low precision math DirectX 7 NV10 99 (Nvidia GeForce 256) R100 00 (ATI Radeon 7500) Vertices Textures Image Transform Rasterization Texturing Z & Alpha FrameBuffer

Programmable GPUs 1st Generation Shaders run programs that do what fixed function hardware did Makes GPUs simpler than CPUs Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Image FrameBuffer

Programmable GPUs 1st Generation DirectX8 Multiple versions of Pixel shaders, 1.1, 1.3, 1.4 13-22 instructions assembler language Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Image FrameBuffer

Programmable GPUs 1st Generation NV20 01 Nvidia GeForce 3 [Lindholm01] R200 01 ATI Radeon 8500 2-wide Vertex shader 4-wide SIMD Pixel shader Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Fixed point ~16bits Image FrameBuffer

Programmable GPUs 2nd Generation R300 02 ATI Radeon 9700 4-wide Vertex shader 8-wide SIMD Pixel shader 24bit floating point math Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Image FrameBuffer

Programmable GPUs NV30 03 2nd Generation Nvidia GeForce FX 5800 16 and 32 bit float Cg (C for graphics) NV40 04 GeForce 6800 [Montrym05] DirectX 9 High Level Shading Language (HLSL) Vertices Shaders Textures Shaders Image Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha FrameBuffer

XBox360 GPU architecture - 05 GPU Transform Texturing Vertex Shaders Pixel Shaders Rasterizer Unified Shader Index Stream Generator Tessellator Texture/Vertex Fetch Vertex Pipeline Pixel Pipeline Display Pixels UNIFIED MEMORY Output Buffer Memory Export DAUGHTER DIE 17

ATI Radeon 4870(R770) Die 2008 260mm2 956 MTransistors

ATI Radeon 4870(R770) Die 260mm2 956 MTransistors Red 10 SIMDs Orange 64 z/stencil 40 texture

Multi-Graphics core NVIDIA Fermi GeForce GTX 480 (GF100) Released March, 2010 Shader Load/Store L1/L2 cache 4x Triangle rate 4 rasterization engines 529mm2, core i7 263mm 2

NVIDIA GeForce GTX 480 SM ATI Radeon HD 5870 SIMD-engine Fetch/ Decode Fetch/ Decode Fetch/ Decode Execution contexts (128 KB) Shared memory (16+48 KB) Execution contexts (256 KB) Shared memory (32 KB) 15 Streaming 20 SIMD-engines Multiprocessors 21

Intel s Knights Corner/ Ferry (Larrabee) 32 Pentium Cores 16 wide SIMD with 32 bit floating point MAD Programmable and debuggable Images courtesy [Seiler08]

Analytical Motion Blur Rasterization with Compression Carl Johan Gribel, Michael Doggett, Tomas Akenine-Möller High-Performance Graphics 2010

Motion Blur Motivation Human visual system designed to detect motion This fails when presented too few images, or too much motion per image - motion gets jumpy Motion blur aids the motion detection of the visual system

Analytical Motion Blur Rasterization Compute Edge Equations and exact exposure intervals - analytic inside-test - visibility management e 1 (t) =(p 2 (t) p 0 (t)) (x 0,y 0, 1) =(((1 t)q 2 + tr 2 ) ((1 t)q 0 + tr 0 )) (x 0,y 0, 1) =(ft 2 + gt + h) (x 0,y 0, 1) Compute the time integral Store and compress the intervals

Moving triangle edge functions depth function t=0.0 t=1.0 0.33 1.0 t t=0.0 t=1.0

Results Ground Truth Stochastic 12 spp Analytical, compression 8

Decoupled Sampling for Real-Time Graphics Pipelines Jonathan Ragan-Kelley, Jiawen Chen, Jaakko Lehtinen, Michael Doggett, Fredo Durand (collab. with MIT) ACM TOG 2011, to be presented at SIGGRAPH 2011 http://bit.ly/decoupledsampling

Decoupled Shading Rendering : - Visibility - compute what is visible - Shading - compute color for each pixel Complex visibility - many stochastic point samples in 5D (space, time, lens aperture) Complex shading - expensive evaluation can be prefiltered Modern GPUs use multisample AA for decoupling

Our technique: Post-visibility Decoupled Sampling 1. Separate visibility from shading samples. 2. Define an explicit mapping from visibility to shading space. 3. Use a cache to manage irregular shading-visibility relationships, without precomputation. Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur t = 0.0 0.25 0.5 0.75 visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur t = 0.0 0.25 0.5 0.75 visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur t = 0.0 0.25 0.5 0.75 visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur t = visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur t = 0.0 0.25 0.5 0.75 visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling with motion blur visibility samples (screen space) shading grid foreach primitive: foreach vis sample: skip if not visible map to shading sample if not in cache: shade and cache else: use cached value Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling transformed primitives covered subpixels visible subpixels colored subpixels Xform Rast Depth- Stencil Map FB Cache shading requests Shade shaded results Slide courtesy Jonathan Ragan-Kelley

Decoupled Sampling transformed primitives covered subpixels visible subpixels colored subpixels Xform Rast Depth- Stencil Map FB Cache shading requests Shade shaded results Slide courtesy Jonathan Ragan-Kelley

Results Slide courtesy Jonathan Ragan-Kelley

shading rate Blur vs. shading rate: defocus moderate blur no blur heavy blur Sort Last 5 4 64 samples/pixel 27 samples/pixel 8 samples/pixel 3 2 1 0 0 50 100 150 200 250 0 blur area (pixels) 4.5-43x less shading than ideal supersampling Half-Life 2, Episode 2 1280x720, 27 samples/pixel Slide courtesy Jonathan Ragan-Kelley

shading rate Blur vs. shading rate: motion heavy blur moderate blur no blur Sort Last 15 64 samples/pixel 27 samples/pixel 8 samples/pixel 10 5 0 0 20 40 60 80 100 120 blur area (pixels) 3-40x less shading than ideal supersampling 0 Team Fortress 2 1280x720, 27 samples/pixel Slide courtesy Jonathan Ragan-Kelley

Blur vs. Shading Rate Sort Last blur area (pixels) shading rate 4 3 2 1 0 7000 6000 5000 4000 3000 2000 1000 64 samples/pixel 27 samples/pixel 8 samples/pixel 0 10 20 30 40 50 0 10 20 30 40 50 50 40 30 20 10 0 0 frame Slide courtesy Jonathan Ragan-Kelley

Depth of field results Valve s Half-Life 2 : Episode 2

Graphics Hardware Hardware on which 3D graphics is created in real-time Traditionally custom hardware Increasing programmability More complex rendering algorithms Programmable hardware available on a wide range Mobile - PowerVR, Desktop - AMD Radeon/Nvidia GeForce Heterogeneous processors - CPU with integrated graphics Sandy Bridge Create new algorithms that make use of architectural features

Device properties Number and type of cores Task parallel OOO/In-order Data parallel SIMD/SIMT, width, VLIW/scalar Caches Size, bandwidths, hierarchy, coherency Adaptable to new varieties of architectures

Challenges Wide range of changing hardware Quest for ever increasing realism Analytical visibility Decoupled sampling Efficient programming and utilization CUDA, OpenCL, DirectCompute Fixed custom scheduling (currently)

Summary GPUs have evolved into massively parallel processors Analytical Motion Blur improves quality leading to more realistic images Decoupled shading enables real-time realistic rendering Need new approaches to adapt to wide variation and complex scheduling

Thanks for listening and Questions