GPGPU Computing. Yong Cao

Size: px

Start display at page:

Download "GPGPU Computing. Yong Cao"

Cornelia Logan
9 years ago
Views:

1 GPGPU Computing Yong Cao

3 Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power High End Qual-Core CPU 16 (4 per core) MHz 96 GFLOPs NVIDIA GTX MHz 1063 GFLOPs ATI Radeon HD MHz 1200 GFLOPs Copyright 2009 by Yong Cao

Power High End Qual-Core CPU 16 (4 per core) 2 3000 MHz 96 GFLOPs

4 Billion Transistors Memory width: 512bit

6 Why Graphics Card NOW? Before: f Two years ago, everyone is using Graphics API (Cg, GLSL, HLSL) for GPGPU programming. Restrict random-read (using Texture), NOT be able to random-write. (No pointer!) Copyright 2009 by Yong Cao

7 Why Graphics Card NOW? Now: NIVIDA released CUDA two years ago, since then Thousands d of CUDA software engineers New job title CUDA programmer Around 200 CUDA based technical publications! Why? Standard C language Support Pointer! Random read and write on GPU memory. Work with C++, Fortran Copyright 2009 by Yong Cao

software engineers New job title CUDA programmer Around 200 CUDA based

12 Execution Mode Graphics Host Input Assembler Vtx Thread Issue Geom Thread Issue Setup / Rstr / ZCull Pixel Thread Issue SP TF L1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 Thread Pr rocessor L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB NVIDIA G80 GPU Copyright 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes 13

TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 Thread Pr rocessor L2 L2 L2 L2 L2 L2 FB FB FB FB

13 Execution Mode General Computing Host Input Assembler Thread Execution Manager Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Texture Texture Texture Texture Texture Texture Texture Texture Load/store Load/store Load/store Load/store Load/store Load/store Global Memory NVIDIA G80 GPU 14 Copyright 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes

Data Cache Texture Texture Texture Texture Texture Texture Texture Texture Load/store Load/store Load/store

14 GPGPU from Graphics Point of View Graphics Processing Pipeline (on GPU) Fixed function pipeline Programmable pipeline with Shaders GPU Processing Model Stream computing model

15 3D Graphics Applications Demos.

16 GPU Fundamentals: The Graphics Pipeline Graphics State Application Vertices (3 3D) Transform & Light Xformed, Lit Ve ertices (2D) Assemble Primitives Screenspace tr iangles (2D) Rasterize Fragments (pre e-pixels) Fragment Processing Final Pixels (Co olor, Depth) Video Memory (Textures) CPU GPU Render-to-texture A simplified graphics pipeline Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

e-pixels) Fragment Processing Final Pixels (Co olor, Depth) Video Memory (Textures) CPU GPU

17 Programmable Graphics Pipeline - Shaders Graphics State Application Vertices (3 3D) Vertex Processor Transform Xformed, Lit Ve ertices (2D) Assemble Primitives Screenspace tr iangles (2D) Fragments (pre e-pixels) Fragment Final Pixels (Co olor, Depth) Rasterizee Shade Video Processor Memory (Textures) CPU GPU Render-to-texture Programmable vertex processor! Programmable fragment processor! Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

olor, Depth) Rasterizee Shade Video Processor Memory (Textures) CPU GPU Render-to-texture Programmable vertex processor!

18 GPU Pipeline: Transform Vertex processor (multiple in parallel) Transform from world space to image space Compute per-vertex lighting Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

Compute per-vertex lighting Copyright 2009 by Yong

19 GPU Pipeline: Rasterize Rasterizer Convert geometric rep. (vertex) to image rep. (fragment) Fragment = image fragment Pixel + associated data: color, depth, stencil, etc. Interpolate per-vertex quantities across pixels Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

(fragment) Fragment = image fragment Pixel + associated data: color, depth,

20 Pixel / Fragment Processor Fragment processors (multiple in parallel) Compute a color for each pixel Optionally read colors from textures (images) Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

read colors from textures (images) Copyright 2009 by Yong

21 GPU Programming g Model: Stream Stream Programming g Model Streams: An array of data units Kernels: Stream Kernel Take streams as input, produce streams at output Perform computation on streams Kernels can be linked together Stream

22 Why Streams? GPGPU Computing Ample computation by exposing parallelism Stream expose data parallelism Multiple stream elements can be processed in parallel Pipeline (task) parallelism li Multiple tasks can be processed in parallel Efficient communication Producer-consumer locality Predictable memory access pattern Optimize for throughput of all elements, not latency of one Processing many elements at once allows latency hiding

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis