Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005



Similar documents
GPU Architecture. Michael Doggett ATI

Radeon HD 2900 and Geometry Generation. Michael Doggett

Computer Graphics Hardware An Overview

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

NVIDIA GeForce GTX 580 GPU Datasheet

Image Synthesis. Transparency. computer graphics & visualization

Hardware design for ray tracing

Introduction to GPGPU. Tiziano Diamanti

Optimizing AAA Games for Mobile Platforms

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Touchstone -A Fresh Approach to Multimedia for the PC

Introduction to Computer Graphics

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

OpenGL Performance Tuning

L20: GPU Architecture and Models

SAPPHIRE HD GB GDDR5 PCIE.

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

GPGPU Computing. Yong Cao

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Midgard GPU Architecture. October 2014

Advanced Visual Effects with Direct3D

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Dynamic Resolution Rendering

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

The Future Of Animation Is Games

3D Computer Games History and Technology

Lecture Notes, CEng 477

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

A Short Introduction to Computer Graphics

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE. Specification

GPU-Driven Rendering Pipelines

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

GPU Shading and Rendering: Introduction & Graphics Hardware

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

Software Virtual Textures

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

INTRODUCTION TO RENDERING TECHNIQUES

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Water Flow in. Alex Vlachos, Valve July 28, 2010

Advanced Rendering for Engineering & Styling

Procedural Shaders: A Feature Animation Perspective

Low power GPUs a view from the industry. Edvard Sørgård

Volume Rendering on Mobile Devices. Mika Pesonen

Optimization for DirectX9 Graphics. Ashu Rege

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

Interactive Level-Set Deformation On the GPU

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Computer Graphics. Anders Hast

Image Synthesis. Fur Rendering. computer graphics & visualization

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Texture Cache Approximation on GPUs

Architecture Guide. VideoCore IV 3D. VideoCore IV 3D Architecture Reference Guide

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Volume visualization I Elvins

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

Why You Need the EVGA e-geforce 6800 GS

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Approval Sheet. Interactive Illumination Using Large Sets of Point Lights

Getting Started with RemoteFX in Windows Embedded Compact 7

Beginning Android 4. Games Development. Mario Zechner. Robert Green

White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Programming 3D Applications with HTML5 and WebGL

1. INTRODUCTION Graphics 2

Catalyst Software Suite Version 9.12 Release Notes

Interactive Level-Set Segmentation on the GPU

How To Use An Amd Graphics Card In Greece And (Amd) With Greege (Greege) With An Amd Greeper 2.2.

How To Teach Computer Graphics

Writing Applications for the GPU Using the RapidMind Development Platform

Introduction to Game Programming. Steven Osman

Transcription:

Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005

Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex Fetch HDR rendering Displaced subdivision surfaces RADEON X1800 Graphics Hardware Graphics APIs GPU Research 2

ATI - Driving the Visual Experience Everywhere Products from cell phones to super computers Gaming Console Integrated Embedded Display Gaming Notebook Color Phone Display Digital TV Multimedia Workstation Multi Monitor Display 3

System architecture CPU 2x 10.8 GB/s Southbridge 2x PCIE 500MB/s GPU Northbridge 22.4GB/s 700MHz 128bit GDDR3 UNIFIED MEMORY 32GB/s DAUGHTER DIE 4

Rendering performance GPU to Daughter Die interface 8 pixels/clk 32BPP color 4 samples Z - Lossless compression 16 pixels/clk Double Z 4 samples Z - Lossless compression GPU 32GB/s DAUGHTER DIE 5

Rendering performance Alpha and Z logic to EDRAM interface 256GB/s Color and Z - 32 samples 32bit color, 24bit Z, 8bit stencil Double Z - 64 samples 24bit Z, 8bit stencil DAUGHTER DIE 8pix/clk, 4x MSAA, Stencil and Z test, Alpha blending 256GB/s 10MB EDRAM 6

GPU architecture GPU Primitive Setup Clipper Rasterizer Hierarchical Z/S Index Stream Generator Tessellator Vertex Pipeline Pixel Pipeline Display Pixels Unified Shader Texture/Vertex Fetch Output Buffer Memory Export UNIFIED MEMORY DAUGHTER DIE 7

Unified Shader A revolutionary step in Graphics Hardware One hardware design that performs both Vertex and Pixel shaders Vertex processing power Vertices Vertices Pixels Vertex Shader Unified Shader Pixels Pixel Shader 8

Unified Shader GPU based vertex and pixel load balancing Better vertex and pixel resource usage Union of features E.g. Control flow, indexable constant, DX9 Shader Model 3.0+ Vertices Pixels 3 SIMDs Unified Shader 48 ALUs Vec4 Scalar 9

Memory Export Shader output to a computed address Virtualize shader resources - multipass Shader debug Randomly update data structures from Vertex or Pixel Shader Scatter write GPU Memory Export UNIFIED MEMORY EDRAM 10

Texture/Vertex Fetch Shader fetch can be either: Texture fetch (16 units) LOD computation Linear, Bi-linear, Tri-linear Filtering Uses cache optimized for 2D, 3D texture data with varying pixel sizes Unified texture cache Vertex fetch (16 units) Uses cache optimized for vertex-style data 11

Texture Arrays Generalization of 6 faced cube maps to 64 faces Each face is a 2D mip mapped surface Not volume texture Applications Animation frames Varying skins for instanced characters / objects Character shadow texture flipbook animations 0 1 63 12

Texture array application : Unique seeds for instanced shading 13

Texture array application : Hundreds of instanced characters 14

Texture compression All of the old DXT formats DXT1, DXT2/3, DXT4/5 Several new formats (variations on above formats) DXT3A 4 bit scalar replicated into four channels in shader DXT3A as 1111 1 bit per channel pixel DXT5A 3bit selection between 2 8bit endpoints DXN 3Dc normal compression, 2-channel version of DXT5A CTX1 2bit selection between 2 8.8bit endpoints 15

CTX1 vs. DXN Upper hemisphere of normals DXN CTX1 Both have 8.8 anchor points CTX1 x and y share 2-bit interpolation value 4 representable normals per 4 4 block of texels 4 bits per texel DXN Independent 3-bit interpolation values for x and y 64 representable normals per 4 4 block of texels 8 bits per texel 16

High Dynamic Range Rendering Special compact HDR render target format: Just 32 bits: 7e3 7e3 7e3 2 Compatible with multisample antialiasing R, G and B are unsigned floating point numbers 7 bits of mantissa 3 bits of exponent Range of 0..16 2 bits of alpha channel 16-bit fixed point at half speed With full blending 17

Displaced subdivision surfaces Prototype algorithm Vineet Goel, ATI research Orlando 18

Displaced subdivision surface algorithm Tessellator: Generates 64 vertices for each patch that are fed into the VS. Vertex Shader: Reads in one-ring, computes Stam s method using precomputed table lookup Adds Displacement map Pixel Shader Adds bump mapping and surface color 19

Displaced subdivision surfaces Base mesh Used by Tessellator to generate vertices Subdivision surface Displaced subdivision surface 20

RADEON X1800 21

22

RADEON X1800 DirectX 9.0 Shader Model 3.0 Pixel Shader 32bit IEEE Single Precision float 4 quad based, threaded units 512 threads of 16 pixels allow efficient dynamic branching 23

Graphics Hardware 24

Graphics APIs Windows Vista Virtual Memory Improved state change efficiency Vista DirectX9 ClearType DirectX10 Unified shader programming model Geometry Shader Access to entire triangle and adjacent vertices Output to array of render targets, cube maps Stream output from Geometry Shader 25

GPU research Improved multi-gpu performance and antialiasing CrossFire Multi-chip, multi-core Higher Order Surfaces Subdivision surfaces, NURBS General Purpose GPU (GPGPU) Driving improved 32bit float performance Xenos shader scatter write 26

GPU research - Shadows Performance enhancements for stencil shadows and shadow buffers Hierarchical rendering User low resolution shadow map to find areas that require detail rasterization Chan, Durand EGSR04 Stencil shadow volumes using 8x8 pixel tiles Aila, Akenine-Möller GH04 Improving Z, stencil performance 27

GPU research Ray Tracing Accelerating ray tracing on GPUs Uniform grid Purcell et. al. SIG02 KD-Tree, performance improvement greater than hardware improvement Foley et. al. GH05 EarlyZ Volume rendering ray casting J. Krüger, R. Westermann VIS03 Build and update acceleration structures Xenos shader scatter write FPGA based ray tracing design RPU, Woop et. al. SIG05 GPU evolution affected by ray tracing 28

GPU research Anti-Aliasing Multisample efficient, effective solution Custom filtering of multisampled images Use Multisample sample mask to control transparency Alpha to coverage Foliage, Chicken wire Order independent transparency 29

RADEON X1800 Demos The Assassin Parthenon Toyshop 30

Questions? 31