The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA
Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques use every trick imaginable build monster hardware
What is a Graphics Processing Unit? Massively Parallel - 1000s of processors (today) Hardware managed parallelism achievable performance Specialized processing order of magnitude more efficient Latency Tolerant throughput oriented Memory Bandwidth saturate 100 s GB/sec Not cache dependent ALU heavy architecture
Or, we could do it by hand Perspective study of a chalice Paolo Uccello, circa 1450
What is reality? Reality is 80 million polygons. Alvy Ray Smith 1999
Image and courtesy Digital Domain
Image and courtesy Digital Domain
Evolution of GPU Game Computing
GPU Architecture Progression Test Drive 6 Far Cry 1999 Multi-texture, 32b rendering 2001 Programmable vertex, 3D textures, shadow maps, multisampling 2003 Fragment programs, Color and depth compression 2005 Transparency antialiasing 2007 Double Precision 1998 16-bit depth, Color, and textures 2000 T&L, cube maps, Texture compression, Anisotropic filtering 2002 Early z-cull, Dual-monitor 2004 Flow control, FP textures, VTF 2006 Unified shader, geometry shader, CUDA/C Ballistics Crysis
Graphics 1998
GPU 2003 Dawn of programmable shading Vertex & Pixel Shaders More (8) pixels per clock Faster/Wider/Larger memory 32-bit and higher precision
GPU 2003 GeForce FX 5800 Shader Model 2 4 ppc / 8tpc @ 500 Mhz Single precision AGP8x 128b 16GB/sec memory system 4 processors @ 500 MHz Single threaded massive SIMD width Programmed in OpenGL, DirectX or assembly
GPU 2003 Riva 128ZX 1998 GeForce FX5800 2003 Texture Performance (tex/s) 0.1G 4 G Antialiasing Performance (sam/s) 0.1G 2 G Depth Performance (pix/s) 0.1G 4 G Floating Point Performance (FLOPS) n/a 12 G Memory Bandwidth (B/s) 1.6G 16 G
The Graphics Pipeline Vertex Transform & Lighting Key abstraction of real-time graphics Triangle Setup & Rasterization Hardware used to look like this Texturing & Pixel Shading Distinct chips/boards/units per stage Depth Test & Blending Fixed data flow through pipeline Framebuffer
GPU Architecture 2003 Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Triangle Setup Depth Cull Texture Texture Texture Texture Texture Texture Texture Texture Pixel Shader Pixel Shader Pixel Shader Pixel Shader Pixel Shader Pixel Shader Pixel Shader Pixel Shader Blend / Depth Blend / Depth Blend / Depth Blend / Depth Blend / Depth Blend / Depth Blend / Depth Blend / Depth Memory Memory Memory Memory Memory Memory Memory Memory
Graphics 2003
Performance Trends Texture Antialiasing Depth Floating Point Bandwidth 3200000 160000 8000 400 20 1 Riva TNT2 GeForce GTS GeForce3 GeForce4 Ti4600 GeForce FX (5800?) GeForce 6 800 Ultra GeForce 7800 GTX GeForce 7900 GTX GeForce 8800 GTX GPU 2008 GPU 2009 GPU 20010 GPU 2011 GPU 2012 GPU 2013 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
GPU Processing CAGR 1998-2007 Compound Annual Growth Rate Texture Performance 1.9 Antialiasing Performance 2.2 Depth Performance 1.9 Floating Point Performance 2.3 Memory Bandwidth 1.6
GPU 2008 Projection CAGR 1998 2007 GeForce FX (2003) Projection GPU 2008 GeForce GTX 200 Texture Performance (tex/s) 1.9 4 G 71 G 48 G Antialiasing Performance (sam/s) 2.2 2 G 252 G 153 G Depth Performance (pix/s) 1.9 4 G 52 G 38 G Floating Point Performance (FLOPS) 2.3 12 G 752 G 1080 G Memory Bandwidth (B/s) 1.6 16 G 133 G 141 G Within 10%
Era of Visual Computing Programmable Graphics CUDA DX11 Future Programmable Shaders DX8 DX9 DX10 Fixed-Function Pipelines 3D Accelerators RIVA 128 GeForce 3 GeForce 6 G80 CUDA GTX200 Next Gen 1997 August 2001 2005 2007 2008 2009
GPU 2008 Dawn of fully programmable generation Unified architectures New graphics functionality geometry shading Programmable in C
The Graphics Pipeline Vertex Remains a useful abstraction Rasterize Hardware used to look like this Pixel Test & Blend Framebuffer
Modern GPU s: Unified Architecture Vertex shaders, pixel shaders, etc. become threads running different programs on flexible cores
Why unify? Vertex Shader Pixel Shader Idle hardware Heavy Geometry Workload Perf = 4 Vertex Shader Idle hardware Pixel Shader Heavy Pixel Workload Perf = 8
Why unify? Unified Shader Vertex Workload Pixel Heavy Geometry Workload Perf = 11 Unified Shader Pixel Workload Vertex Heavy Pixel Workload Perf = 11
GPU 2008 GeForce GTX 200 Shader Model 4 Double precision PCI-Express Gen2 512b 140GB/sec memory system 32ppc / 80 tpc @ 600 Mhz 240 processors @ 1.5 GHz Many threaded scalar Programmed in OpenGL, DirectX, CUDA TM C
GPU Processing CPU GFLOPS GPU GFLOPS
Memory & I/O Processing Cores GPU Architecture 2008 Dataflow Manager... Special Purpose Processor Special Purpose Processor Special Purpose Processor Special Purpose Processor L1 L1 L1 L1 L1 L2 Memory & I/O Processor Communication Fabric
State of the art 2008 Grid - Codemasters Far Cry 2 - Ubisoft Call of Duty 5 - Activision Bionic Commando - Capcom
GPU 2013 2003 Fragment programs, Color and depth compression 2005 Transparency, antialiasing 2008 Double precision Preemption Virtual Pipeline 2004 Flow control, FP textures, VTF 2006 Unified shader, geometry shader, CUDA/C C++ Complete pointer support Adaptive Workload Partitioning
GPU 2013 Arbitrary dataflow General purpose programming model Special purpose hardware Hardware managed threading and pipelining Freely intermingle graphics & computation
Performance Trends Texture Antialiasing Depth Floating Point Bandwidth 3200000 160000 8000 400 20 1 Riva TNT2 GeForce GTS GeForce3 GeForce4 Ti4600 GeForce FX (5800?) GeForce 6 800 Ultra GeForce 7800 GTX GeForce 7900 GTX GeForce 8800 GTX GPU 2008 GPU 2009 GPU 20010 GPU 2011 GPU 2012 GPU 2013 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
GPU 2013 CAGR 1998-2007 GeForce GTX 200 GPU 2013 Texture Performance (tex/s) 1.9 48 G 1+ T Antialiasing Performance (sam/s) 2.2 153 G 10+ T Depth Performance (pix/s) 1.9 38 G 1+ T Floating Point Performance (FLOPS) 2.3 1080 G 20+ T Memory Bandwidth (B/s) 1.6 141 G 1+ T
Rich Environments Fracture Soft Shadows Detailed Characters Ambient Occlusion Indirect Lighting Subsurface Scatter Turbulence Participating Media Simulations Fluids
The Future of Graphics Processing Programmable and Specialized Processing Monster performance and power efficiency Graphics and Arbitrary C/C++ Programming Global Illumination and Rasterization Rendering and Simulation Evolution
The Evolution of Computer Graphics