Programming Graphics Hardware. Randy Fernando, Cyril Zeller

Similar documents
Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

OpenGL Performance Tuning

Computer Graphics Hardware An Overview

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU Architecture. Michael Doggett ATI

Introduction to GPGPU. Tiziano Diamanti

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Optimization for DirectX9 Graphics. Ashu Rege

3D Computer Games History and Technology

GPGPU Computing. Yong Cao

Advanced Visual Effects with Direct3D

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Touchstone -A Fresh Approach to Multimedia for the PC

Dynamic Resolution Rendering

Optimizing AAA Games for Mobile Platforms

NVIDIA GeForce GTX 580 GPU Datasheet

Radeon HD 2900 and Geometry Generation. Michael Doggett

Image Synthesis. Transparency. computer graphics & visualization

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

How PCI Express Works (by Tracy V. Wilson)

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE

A Crash Course on Programmable Graphics Hardware

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Lecture 15: Hardware Rendering

SAPPHIRE HD GB GDDR5 PCIE.

Lecture Notes, CEng 477

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

INTRODUCTION TO RENDERING TECHNIQUES

L20: GPU Architecture and Models

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

Introduction to Computer Graphics

The Future Of Animation Is Games

Advanced Rendering for Engineering & Styling

GPU Shading and Rendering: Introduction & Graphics Hardware

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

High Dynamic Range and other Fun Shader Tricks. Simon Green

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE. Specification

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Writing Applications for the GPU Using the RapidMind Development Platform

Graphics and Computing GPUs

How To Teach Computer Graphics

Intel Graphics Media Accelerator 900

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

Midgard GPU Architecture. October 2014

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Software Virtual Textures

3D Math Overview and 3D Graphics Foundations

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

OctaVis: A Simple and Efficient Multi-View Rendering System

A Short Introduction to Computer Graphics

RADEON 9700 SERIES. User s Guide. Copyright 2002, ATI Technologies Inc. All rights reserved.

Graphical displays are generally of two types: vector displays and raster displays. Vector displays

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

QCD as a Video Game?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Low power GPUs a view from the industry. Edvard Sørgård

OpenEXR Image Viewing Software

In the early 1990s, ubiquitous

================================================================== CONTENTS ==================================================================

GPU Christmas Tree Rendering. Evan Hart

Configuring Memory on the HP Business Desktop dx5150

Introduction to Game Programming. Steven Osman

Boundless Security Systems, Inc.

SkillsUSA 2014 Contest Projects 3-D Visualization and Animation

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

Why You Need the EVGA e-geforce 6800 GS

1. INTRODUCTION Graphics 2

Volume Rendering on Mobile Devices. Mika Pesonen

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Data Sheet. Desktop ESPRIMO. General

Introduction to Graphics Software Development for OMAP 2/3

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Console Architecture. By: Peter Hood & Adelia Wong

PCI vs. PCI Express vs. AGP

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Introduction to Computer Graphics

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Intel s Next Generation Integrated Graphics Architecture Intel Graphics Media Accelerator X3000 and 3000

Transcription:

Randy Fernando, Cyril Zeller

Overview of the Tutorial 10:45 Introduction to the Hardware Graphics Pipeline Cyril Zeller 12:00 Lunch 14:00 High-Level Shading Languages Randy Fernando 15:15 break 15:45 GPU Applications Cyril Zeller / Randy Fernando 17:00 End

Introduction to the Hardware Graphics Pipeline Cyril Zeller

Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995: Texture mapping and z-buffer 1998: Multitexturing 1999: Transform and lighting 2001: Programmable vertex shader 2002: Programmable pixel shader 2004: Shader model 3.0 and 64-bit color support PC graphics software architecture Optimization

Real-Time Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per second 3D Scene = Collection of 3D primitives (triangles, lines, points) Image = Array of pixels

Hardware Graphics Pipeline Application Stage Geometry Stage Rasterization Stage 3D Triangles 2D Triangles Pixels For each triangle vertex: Transform 3D position into screen position Compute attributes For each triangle: Rasterize triangle Interpolate vertex attributes across triangle Shade pixels Resolve visibility

PC Architecture Motherboard Central Processor Unit (CPU) System Memory Bus Port (PCI, AGP, PCIe) Video Board Video Memory Graphics Processor Unit (GPU)

1995: Texture Mapping and Z-Buffer CPU GPU Application / Geometry Stage Rasterization Stage Rasterizer Texture Unit Raster Operations Unit 2D Triangles Textures Bus (PCI) 2D Triangles Textures Frame Buffer System Memory Video Memory PCI: Peripheral Component Interconnect 3dfx s Voodoo

Texture Mapping Triangle Mesh textured with Base Texture + =

Texture Mapping: Texture Coordinates Interpolation Screen Space Texture Space Texel y x (x 1, y 1 ) v u (u1, v1) (x 0, y 0 ) (x, y) (x 2, y 2 ) (u 0, v 0 ) (u, v) (u 2, v 2 )

Texture Mapping: Perspective-Correct Interpolation Perspective-Incorrect Perspective-Correct

Texture Mapping: Magnification Screen Space Texture Space x Pixel u Pixel s texture footprint y v or Nearest-Point Sampling Bilinear Filtering

Texture Mapping: Minification Screen Space Texture Space x Pixel u Pixel s texture footprint y v

Texture Mapping: Mipmapping or Bilinear Filtering Trilinear Filtering

Texture Mapping: Anisotropic Filtering or Bilinear Filtering Trilinear Filtering

Texture Mapping: Addressing Modes Wrap Mirror

Texture Mapping: Addressing Modes Clamp Border

Raster Operations Unit (ROP) Rasterizer Texture Unit Fragments Raster Operations Unit Scissor Test stencil buffer value at (x, y) tested against reference value Fragment Screen Position (x, y) Alpha Value a tested against scissor rectangle tested against reference value Alpha Test Stencil Test Frame Buffer Stencil Buffer tested against Z Test Z-Buffer Depth z z-buffer value at (x, y) (visibility test) blended with Alpha Blending Color Buffer Pixels color buffer value at (x, y): Color (r, g, b) K src * Color src + K dst * Color src (src = fragment dst = color buffer)

1998: Multitexturing CPU GPU Application / Geometry Stage Rasterization Stage Rasterizer Multitexture Unit Raster Operations Unit 2D Triangles Textures Bus (AGP) 2D Triangles Textures Frame Buffer System Memory Video Memory AGP: Accelerated Graphics Port NVIDIA s TNT, ATI s Rage

AGP PCI uses a parallel connection AGP uses a serial connection Fewer pins, simpler protocol Cheaper, more scalable PCI uses a shared-bus protocol AGP uses a point-to-point protocol Bandwidth is not shared among devices AGP uses a dedicated system memory called AGP memory or non-local video memory The GPU can lookup textures that resides in AGP memory Its size is called the AGP aperture Bandwidth: AGP = 2 x PCI (AGP2x = 2 x AGP, etc.)

Multitexturing Base Texture modulated by Light Map X = from UT2004 (c) Epic Games Inc. Used with permission

1999: Transform and Lighting CPU Application Stage GPU Fixed Function Pipeline Geometry Stage Transform and Lighting Unit Rasterization Stage Rasterizer Texture Unit Register Combiner Raster Operations Unit 3D Triangles Textures Bus (AGP) 3D Triangles Textures Frame Buffer System Memory Video Memory Register Combiner: Offers many more texture/color combinations NVIDIA s GeForce 256 and GeForce2, ATI s Radeon 7500, S3 s Savage3D

Transform and Lighting Unit (TnL) Transform Transform and Lighting Unit Model or Object Space World Matrix World Space Model-View Matrix View Matrix Camera or Eye Space Projection Matrix Projection or Clip Space Perspective Division and Viewport Matrix Screen or Window Space Lighting Material Properties Light Properties Vertex Diffuse and Specular Color Vertex Color

Bump Mapping Bump mapping is about fetching the normal from a texture (called a normal map) instead of using the interpolated normal to compute lighting at a given pixel + = Normal Map Diffuse light without bump Diffuse light with bumps

Cube Texture Mapping Cubemap (covering of a cube) Environment Mapping z the six faces (x,y, z) y x Cubemap lookup (with direction (x, y, z)) (the reflection vector is used to lookup the cubemap)

Projective Texture Mapping Projected Texture (x, y, z, w) (x/w, y/w) Texture Projection Projective Texture lookup

2001: Programmable Vertex Shader CPU GPU Application Stage Geometry Stage Vertex Shader (no flow control) Rasterization Stage Rasterizer (with Z-Cull) Register Combiner Raster Operations Unit Texture Shader 3D Triangles Textures System Memory Bus (AGP) 3D Triangles Video Memory Textures Z-Cull: Predicts which fragments will fail the Z test and discards them Texture Shader: Offers more texture addressing and operations NVIDIA s GeForce3 and GeForce4 Ti, ATI s Radeon 8500 Frame Buffer

Vertex Shader A programmable processor for per-vertex computation void VertexShader( // Input per vertex in float4 positioninmodelspace, in float2 texturecoordinates, in float3 normal, ) { // Input per batch of triangles uniform float4x4 modeltoprojection, uniform float3 lightdirection, // Output per vertex out float4 positioninprojectionspace, out float2 texturecoordinatesoutput, out float3 color // Vertex transformation positioninprojectionspace = mul(modeltoprojection, positioninmodelspace); // Texture coordinates copy texturecoordinatesoutput = texturecoordinates; } // Vertex color computation color = dot(lightdirection, normal);

Volume Texture Mapping Volume Texture (3D Noise) (x,y,z) x z y Noise Perturbation Volume Texture lookup (with position (x, y, z))

Hardware Shadow Mapping Shadow Map Computation The shadow map contains the depth z/w of the 3D points visible from the light s point of view Store z/w Spot light (x, y, z, w) (x/w, y/w) Shadow Shadow Rendering Lookup map A 3D point (x, y, z, w) is in shadow if: z/w < value of shadow map at (x/w, y/w) A hardware shadow map lookup returns the value of this comparison value Spot light (x/w, y/w) (x, y, z, w) between 0 and 1

Antialiasing: Definition Aliasing: Undesirable visual artifacts due to insufficient sampling of: Primitives (triangles, lines, etc.) jagged edges Textures or shaders pixelation, moiré patterns Those artifacts are even more noticeable on animated images Antialiasing: Method to reduce aliasing Texture antialiasing is largely handled by proper mipmapping and anisotropic filtering Shader antialiasing can be tricky (especially with conditionals)

Antialiasing: Supersampling and Multisampling Supersampling: Compute color and Z at higher resolution and display averaged color to smooth out the visual artifacts Pixel Center Sample Multisampling: Same thing except only Z is computed at higher resolution As a result, multisampling performs antialiasing on primitive edges only

2002: Programmable Pixel Shader CPU GPU Application Stage Geometry Stage Vertex Shader (static and dynamic flow control) Rasterization Stage Rasterizer (with Z-Cull) Pixel Shader (static flow control only) Raster Operations Unit Texture Unit 3D Triangles Textures Bus (AGP) 3D Triangles Textures Frame Buffer System Memory Video Memory MRT: Multiple Render Target NVIDIA s GeForce FX, ATI s Radeon 9600 to 9800 and X600 to X800

Pixel Shader A programmable processor for per-pixel computation void PixelShader( // Input per pixel in float2 texturecoordinates, in float3 normal, ) { // Input per batch of triangles uniform sampler2d basetexture, uniform float3 lightdirection, // Output per pixel out float3 color // Texture lookup float3 basecolor = tex2d(basetexture, texturecoordinates); // Light computation float light = dot(lightdirection, normal); } // Pixel color computation color = basecolor * light;

Shader: Static vs. Dynamic Flow Control void Shader(... // Input per vertex or per pixel in float3 normal, Static Flow Control (condition varies per batch of triangles) Dynamic Flow Control (condition varies per vertex or pixel) ) { } // Input per batch of triangles uniform float3 lightdirection, uniform bool computelight,...... if (computelight) {... if (dot(lightdirection, normal)) {... }... }...

2004: Shader Model 3.0 and 64-Bit Color Support CPU GPU Application Stage Geometry Stage Vertex Shader (static and dynamic flow control) Rasterization Stage Rasterizer (with Z-Cull) Pixel Shader (static and dynamic flow control) Raster Operations Unit Texture Unit 64-Bit 3D Triangles Color Textures Bus (PCIe) 3D Triangles Textures Frame Buffer System Memory Video Memory PCIe: Peripheral Component Interconnect Express NVIDIA s GeForce 6 Series (6800, 6600 and 6200)

PCIe Like AGP: Uses a serial connection Cheap, scalable Uses a point-to-point protocol No shared bandwidth Unlike AGP: General-purpose (not only for graphics) Dual-channels: Bandwidth is available in both directions Bandwidth: PCIe = 2 x AGP8x

Shader Model 3.0 Shader Model 3.0 means: Longer shaders More complex shading Pixel shader: Dynamic flow control Better performance Derivative instructions Shader antialiasing Support for 32-bit floating-point precision Fewer artifacts Face register Faster two-sided lighting Vertex shader: Texture access (Vertex Texture Fetch) Simulation on GPU, displacement mapping Vertex buffer frequency Efficient geometry instancing

Shader Model 3.0 Unleashed Image used with permission from Pacific Fighters. 2004 Developed by 1C:Maddox Games. All rights reserved. 2004 Ubi Soft Entertainment.

64-Bit Color Support 64-bit color means one 16-bit floating-point value per channel (R, G, B, A) Alpha blending works with 64-bit color buffer (as opposed to 32-bit fixed-point color buffer only) Texture filtering works with 64-bit textures (as opposed to 32-bit fixed-point textures only) Applications: High-precision image compositing High dynamic range imagery

High Dynamic Range Imagery The dynamic range of a scene is the ratio of the highest to the lowest luminance Real-life scenes can have high dynamic ranges of several millions Display and print devices have a low dynamic range of around 100 Tone mapping is the process of displaying high dynamic range images on those low dynamic range devices High dynamic range images use floating-point colors OpenEXR is a high dynamic range image format that is compatible with NVIDIA s 64-bit color format

Real-Time Tone Mapping The image is entirely computed in 64-bit color and tone-mapped for display From low to high exposure image of the same scene

2005: Multi-GPU NVIDIA s Scalable Link Interface multi-gpu technology takes advantage of the increased bandwidth of the PCI Express to automatically accelerates applications through a combination of intelligent hardware and software solutions SLI Connector

PC Graphics Software Architecture CPU Application BUS GPU Vertex Program Pixel Program 3D API (OpenGL or DirectX) Commands Programs Vertex Shader Pixel Shader Driver System Memory Geometry (triangles, vertices, normals, etc...) Textures Video Memory The application, 3D API and driver are written in C or C++ The vertex and pixel programs are written in a high-level shading language (Cg, DirectX HLSL, OpenGL Shading Language)

Basic Structure of a Graphics Application - Initialize API Initialization - Check hardware capabilities - Create resources: render targets, shaders, textures, geometry - For every frame: Rendering loop - Draw to back buffer: - Clear frame buffer - For each draw call: - Set index and vertex buffers - Set vertex and pixel shaders and their parameters - Set texture sampling parameters - Set render states - Set render target - Issue draw command - Swap back buffer and front buffer

workload Optimization: Bottleneck CPU GPU Bus Vertex Shader Rasterizer Texture Unit Pixel Shader Raster Operations Unit Frame Buffer A multi-processor pipeline architecture means that the overall throughput is determined by the bottleneck The bottleneck varies from one draw call to another

Optimization: Working on the Bottleneck 1. Find bottleneck by selectively: modifying workload of stages under-clocking various domains (CPU, bus, GPU) 2. Decrease workload of bottleneck: workload workload 3. Or increase workload of nonbottleneck stages: workload 4. Go back to 1.

Optimization: Finding the Bottleneck Run App Vary FB b/w FPS varies? Yes FB b/w limited No Vary texture size/filtering Vary resolution Vary vertex instructions FPS varies? No FPS varies? No FPS varies? No Yes Yes Yes Texture b/w limited Vary fragment instructions Vertex limited Yes FPS varies? No Fragment limited Raster limited Vary vertex size/ bus rate FPS varies? No Yes Bus transfer limited CPU limited

Evolution of Performance 10 000 Mpixels/s 1000 100 Mvertices/s Mtransistors 10 Bus Video memory 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 PCI (133 MB/s) AGP (266 MB/s) DirectX 1 API DirectX 2 DirectX 3 OpenGL 1.1 AGP2x (533 MB/s) DirectX 5 OpenGL 1.2 AGP4x (1.06 GB/s) AGP8x (2.1 GB/s) DirectX 8 DirectX 9 DirectX 6 DirectX 7 OpenGL 1.4 OpenGL 1.3 OpenGL 1.5 PCIe (4 GB/s) 4 MB 32 MB 64 MB 128 MB 256 MB 512 MB

The Future Unified general programming model at primitive, vertex and pixel levels Scary amounts of: Floating point horsepower Video memory Bandwidth between system and video memory Lower chip costs and power requirements to make 3D graphics hardware ubiquitous: Automotive (gaming, navigation, heads-up displays) Home (remotes, media center, automation) Mobile (PDAs, cell phones)

References Tons of resources at http://developer.nvidia.com: Code samples Programming guides Recent conference presentations A good website and book on real-time rendering: http://www.realtimerendering.com

Questions Support e-mail: devrelfeedback@nvidia.com [Technical Questions] sdkfeedback@nvidia.com [Tools Questions]