GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Size: px
Start display at page:

Download "GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology"

Transcription

1 GPUs Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

2 Bandwidth Gravity of modern computer systems The bandwidth between key components ultimately dictates system performance Especially true for massively parallel systems processing massive amount of data Tricks like buffering, reordering, caching can temporarily defy the rules in some cases Ultimately, the performance falls back to what the feeds and speeds dictate PCIe replaced AGP (Advanced Graphics Port) Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 6, Fall 2007; used with permission See 2

3 3D buzzwords Fill Rate how fast the GPU can generate pixels, often a strong predictor for application frame rate Performance Metrics Mtris/sec - Triangle Rate Mverts/sec - Vertex Rate Mpixels/sec - Pixel Fill (Write) Rate Mtexels/sec - Texture Fill (Read) Rate Msamples/sec - Antialiasing Fill (Write) Rate Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 3

4 Adding programmability to the pipeline 3D Application or Game 3D API Commands 3D API: OpenGL or Direct3D CPU GPU Boundary GPU Command & Data Stream GPU Front End Vertex Index Stream Primitive Assembly Assembled Polygons, Lines, and Points Rasterization & Interpolation Pixel Location Stream Raster Operations Pixel Updates Framebuffer Pre-transformed Vertices Programmable Vertex Processor Transformed Vertices Rasterized Pre-transformed Fragments Programmable Fragment Processor Transformed Fragments Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 4

5 Shader data Typically floats, and vectors/matrices of floats Fixed size arrays Three main types: Per-instance data, e.g., per-vertex position Per-pixel interpolated data, e.g., texture coordinates Per-batch data, e.g., light position Data are tightly bound to the GPU

6 Shader flow control Very simple No recursion Fixed size loops for Shader Model 2.0 or earlier Simple if-then-else statements allowed in the latest APIs Texkill (asm) or clip (HLSL) or discard (GLSL) allows you to abort a write to a pixel (form of flow control)

7 Specialized instructions (GeForce 6) Dot products Exponential instructions: EXP, EXPP, LOG, LOGP LIT (Blinn specular lighting model calculation!) Reciprocal instructions: RCP (reciprocal) RSQ (reciprocal square root!) Trignometric functions SIN, COS Swizzling (swapping xyzw), write masking (only some xyzw get assigned), and negation is free From GPU Gems 2, p

8 Vertex shader Transform to clip-space (i.e., screen space) Inputs: Common inputs: Vertex position (x, y, z, w) Texture coordinate Constant inputs Can also have fog, color as input, but usually passes them untouched to the pixel shader Output to a pixel (fragment) shader Vertex shader is executed once per vertex, could be less expensive than pixel shader

9 Vertex shader data flow (3.0) Vertex stream 32 Temporary registers al Loop Register r0 r1 r2 r31 a0 Address Register v0 v1 v2 16 Vertex data registers Vertex Shader v15 C0 C1 C2 Cn 12 output registers Constant float registers (at least 256) 16 Constant Integer Registers opos otn ofog od0 od1 opts position texture fog Diff. color Spec. color Output Pt size Each register is a 4-component vector register except al

10 Vertex shader: logical view Per-vertex Input Data Register File r0 r1 r2 r3... Shader Start Addr Bound Textures Bound Samplers Bound Consants Vertex Processing Unit Swizzle / Mask Unit.rgba.xyzw.zzzz.xxyz... cosine log Math/Logic sine sub Unit add... Per-vertex Output Data Transformed and Lit vertices Shader Resources (bound by application) Sampler Unit Texture Memory Shader Constants Input Data Output Data State Information Memory Architectural State Control Logic

11 Some uses of vertex shaders Transform vertices to clip-space Pass normal, texture coordinates to PS Transform vectors to other spaces (e.g., texture space) Calculate per-vertex lighting (e.g., Gouraud shading) Distort geometry (waves, fish-eye camera) Adapted from Mart Slot s presentation

12 Easy cross products and normalization From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall 12

13 Blinn lighting in one instruction From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall 13

14 Simple graphics pipeline From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall 14

15 Pixel (or fragment) shader (1) Determine each fragment s color Custom (sophisticated) pixel operations Texture sampling Inputs Interpolated output from vertex shader Typically vertex position, vertex normals, texture coordinates, etc. These registers could be reused for other purpose Output Color (including alpha) Depth value (optional)

16 Pixel (or fragment) shader (2) Executed once per pixel, hence typically executed many more times than a vertex shader It is advantageous to compute stuff on a per-vertex basis to improve performance

17 Pixel shader data flow (3.0) Temporary registers r0 r1 r31 v0 Pixel stream v1 Color (diff/spec) and texture coord. registers oc0 Pixel Shader v9 odepth C0 C1 Cn s0 s1 s15 Constant registers (16 INT, 224 Float) Sampler Registers (Up to 16 texture surfaces can be read in a single pass) color Depth

18 Pixel shader: logical view Interpolator Per-pixel Input Data Register File r0 r1 r2 r3... Shader Start Addr Bound Textures Bound Samplers Bound Consants Pixel Processing Unit Swizzle / Mask Unit.rgba.xyzw.zzzz.xxyz... cosine log Math/Logic sine sub Unit add... Per-pixel Output Data Pixel Color Depth Info Stencil Info Shader Resources (bound by application) Sampler Unit Texture Memory Color buffer Depth Buffer Stencil Buffer Shader Constants Input Data Output Data State Information Memory Architectural State Control Logic

19 Some uses of pixel shaders Texturing objects Per-pixel lighting (e.g., Phong shading) Normal mapping (each pixel has its own normal) Shadows (determine whether a pixel is shadowed or not) Environment mapping Adapted from Mart Slot s presentation

20 Old GeForce graphics pipeline Host Vertex Control VS/T&L Vertex Cache Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 20

21 Vertex cache Host Vertex Control VS/T&L Vertex Cache Re-using vertices between primitives saves PCIe bus bandwidth and GPU computational resources Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory A vertex cache attempts to exploit commonality between triangles to generate vertex reuse Unfortunately, many applications do not use efficient triangular ordering Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 21

22 Texture cache Host Vertex Control T&L Vertex Cache Stores temporally local texel values to reduce bandwidth requirements Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory Due to nature of texture filtering high degrees of efficiency are possible (75% or better hit rates) Reduces texture (memory) bandwidth by a factor of four for bilinear filtering Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 22

23 Built-in texture filtering (GeForce 6) Pixel texturing Hardware supports 2D, 3D, and cube map Non power-of-2 textures OK Hardware handles addressing and interpolation Bilinear, trilinear (3D or mipmap), anisotropic Vertex texturing Vertex processors can access texture memory too Only nearest-neighbor filtering supported in G60 hardware 23

24 Host ROP (from Raster Operations) Vertex Control T&L Vertex Cache Triangle Setup Raster C-ROP performs frame buffer blending Combinations of colors and transparency Shader ROP FBI Texture Cache Frame Buffer Memory Antialiasing Read/Modify/Write the Color Buffer Z-ROP performs the Z operations Determine the visible pixels Discard the occluded pixels Read/Modify/Write the Z-Buffer ROP on GeForce also performs Coalescing of transactions Z-Buffer compression/decompression Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 24

25 The frame buffer Host Vertex Control T&L Vertex Cache The primary determinant of graphics performance other than the GPU The most expensive component of a graphics product other than the GPU Memory bandwidth is the key Frame buffer size also determines Local texture storage Maximum resolutions Anitaliasing resolution limits Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 25

26 Frame Buffer Interface (FBI) Host Vertex Control Surface Engine T&L Vertex Cache Manages reading from and writing to frame buffer Perhaps the most performance-critical component of a GPU GeForce s FBI is a crossbar Independent memory controllers for 4+ independent memory banks for more efficient access to frame buffer Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See 26

27 GeForce 7800 GTX board details SLI Connector Single slot cooling svideo TV Out DVI x 2 16x PCI-Express Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 6, Fall 2007; used with permission See 256MB/256-bit DDR3 600 MHz 8 pieces of 8Mx32 27

28 From G70 Architecture NVIDIA 7800 GTX Vertex Processors Pixel Processors ROPs (Raster Op. Units) 28

29 NVIDIA 7800 GTX NVIDIA 7800 GTX Vertex processors 7800 GTX has G70 Architecture 8 of these Vertex Processors G70 Architecture 29

30 NVIDIA 7800 GTX NVIDIA 7800 GTX Pixel processors 8 MADD (multiply/add) instructions in a single cycle G70 Architecture 7800 GTX has 24 of these From 30

31 NVIDIA 7800 GTX Modern GPUs: unified design G70 Architecture Vertex Processors Slide by David Luebke from G70 Architecture 31

32 GeForce 8 architecture Slide by David Luebke from 32

33 Why unify? (1) Slide by David Luebke from 33

34 Why unify? (2) Slide by David Luebke from 34

35 Dynamic load balancing Company of Heroes Slide by David Luebke from 35

36 Motivation for shader languages Programming powerful hardware with assembly code is hard Programmers need the benefits of a high-level language: Easier programming Easier code reuse Easier debugging Portability Assembly DP3 R0, c[11].xyzx, c[11].xyzx; RSQ R0, R0.x; MUL R0, R0.x, c[11].xyzx; MOV R1, c[3]; MUL R1, R1.x, c[0].xyzx; DP3 R2, R1.xyzx, R1.xyzx; RSQ R2, R2.x; MUL R1, R2.x, R1.xyzx; ADD R2, R0.xyzx, R1.xyzx; DP3 R3, R2.xyzx, R2.xyzx; RSQ R3, R3.x; MUL R2, R3.x, R2.xyzx; DP3 R2, R1.xyzx, R2.xyzx; MAX R2, c[3].z, R2.x; MOV R2.z, c[3].y; MOV R2.w, c[3].y; LIT R2, R2;... float3 cspecular = pow(max(0, dot(nf, H)), phongexp).xxx; float3 cplastic = Cd * (cambient + cdiffuse) + Cs * cspecular; From The Cg Tutorial

37 Shader languages HLSL/Cg most common Both are more-or-less compatible Other alternatives: GLSL (for OpenGL) Assembly?

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

OpenGL Performance Tuning

OpenGL Performance Tuning OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

A Crash Course on Programmable Graphics Hardware

A Crash Course on Programmable Graphics Hardware A Crash Course on Programmable Graphics Hardware Li-Yi Wei Abstract Recent years have witnessed tremendous growth for programmable graphics hardware (GPU), both in terms of performance and functionality.

More information

Touchstone -A Fresh Approach to Multimedia for the PC

Touchstone -A Fresh Approach to Multimedia for the PC Touchstone -A Fresh Approach to Multimedia for the PC Emmett Kilgariff Martin Randall Silicon Engineering, Inc Presentation Outline Touchstone Background Chipset Overview Sprite Chip Tiler Chip Compressed

More information

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Programming the GPU: High-Level Shading Languages

Programming the GPU: High-Level Shading Languages Programming the GPU: High-Level Shading Languages Randy Fernando Developer Technology Group Talk Overview The Evolution of GPU Programming Languages GPU Programming Languages and the Graphics Pipeline

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

INTRODUCTION TO RENDERING TECHNIQUES

INTRODUCTION TO RENDERING TECHNIQUES INTRODUCTION TO RENDERING TECHNIQUES 22 Mar. 212 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at a time Model only once X 24 frames per second Color / texture only once 15, frames for a feature

More information

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro) Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Brains Eden, 28 th June 2013 Agenda Introduction The author ARM Ltd. What do you need to have What do you need to know Identify

More information

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT COMP27112 Computer Graphics and Image Processing 2: Introducing image synthesis Toby.Howard@manchester.ac.uk 1 Introduction In these notes we ll cover: Some orientation how did we get here? Graphics system

More information

Image Synthesis. Transparency. computer graphics & visualization

Image Synthesis. Transparency. computer graphics & visualization Image Synthesis Transparency Inter-Object realism Covers different kinds of interactions between objects Increasing realism in the scene Relationships between objects easier to understand Shadows, Reflections,

More information

Dynamic Resolution Rendering

Dynamic Resolution Rendering Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying

More information

3D Computer Games History and Technology

3D Computer Games History and Technology 3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed

More information

Developer Tools. Tim Purcell NVIDIA

Developer Tools. Tim Purcell NVIDIA Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010 GPU Architecture An OpenCL Programmer s Introduction Lee Howes November 3, 2010 The aim of this webinar To provide a general background to modern GPU architectures To place the AMD GPU designs in context:

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications

More information

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Performance Optimization and Debug Tools for mobile games with PlayCanvas Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with

More information

Optimization for DirectX9 Graphics. Ashu Rege

Optimization for DirectX9 Graphics. Ashu Rege Optimization for DirectX9 Graphics Ashu Rege Last Year: Batch, Batch, Batch Moral of the story: Small batches BAD What is a batch Every DrawIndexedPrimitive call is a batch All render, texture, shader,...

More information

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology Agenda Evolution of GPUs Computing Revolution Stream Processing Architecture details of modern GPUs Evolution of GPUs

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1 Welcome to GUI! Mechanics 26/02/2014 1 Requirements Info If you don t know C++, you CAN take this class additional time investment required early on GUI Java to C++ transition tutorial on course website

More information

Introduction to Game Programming. Steven Osman sosman@cs.cmu.edu

Introduction to Game Programming. Steven Osman sosman@cs.cmu.edu Introduction to Game Programming Steven Osman sosman@cs.cmu.edu Introduction to Game Programming Introductory stuff Look at a game console: PS2 Some Techniques (Cheats?) What is a Game? Half-Life 2, Valve

More information

Optimizing AAA Games for Mobile Platforms

Optimizing AAA Games for Mobile Platforms Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo

More information

Lecture 15: Hardware Rendering

Lecture 15: Hardware Rendering Lecture 15: Hardware Rendering Fall 2004 Kavita Bala Computer Science Cornell University Announcements Project discussion this week Proposals: Oct 26 Exam moved to Nov 18 (Thursday) Bounding Volume vs.

More information

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation sim.dietrich@nvidia.com

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation sim.dietrich@nvidia.com Modern Graphics Engine Design Sim Dietrich NVIDIA Corporation sim.dietrich@nvidia.com Overview Modern Engine Features Modern Engine Challenges Scene Management Culling & Batching Geometry Management Collision

More information

Lecture Notes, CEng 477

Lecture Notes, CEng 477 Computer Graphics Hardware and Software Lecture Notes, CEng 477 What is Computer Graphics? Different things in different contexts: pictures, scenes that are generated by a computer. tools used to make

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

GPU Shading and Rendering: Introduction & Graphics Hardware

GPU Shading and Rendering: Introduction & Graphics Hardware GPU Shading and Rendering: Introduction & Graphics Hardware Marc Olano Computer Science and Electrical Engineering University of Maryland, Baltimore County SIGGRAPH 2005 Schedule Shading Technolgy 8:30

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

To sit in the shade on a fine day and look upon verdure is the most perfect refreshment. Jane Austen: Mansfield Park, IX, 1814

To sit in the shade on a fine day and look upon verdure is the most perfect refreshment. Jane Austen: Mansfield Park, IX, 1814 Chapter 13 Pixel Shaders The tree casts its shade upon all, even upon the woodcutter. Sanskrit Proverb To sit in the shade on a fine day and look upon verdure is the most perfect refreshment. Jane Austen:

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Advanced Visual Effects with Direct3D

Advanced Visual Effects with Direct3D Advanced Visual Effects with Direct3D Presenters: Mike Burrows, Sim Dietrich, David Gosselin, Kev Gee, Jeff Grills, Shawn Hargreaves, Richard Huddy, Gary McTaggart, Jason Mitchell, Ashutosh Rege and Matthias

More information

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007 Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Questions 2007 INSTRUCTIONS: Answer all questions. Spend approximately 1 minute per mark. Question 1 30 Marks Total

More information

GPGPU: General-Purpose Computation on GPUs

GPGPU: General-Purpose Computation on GPUs GPGPU: General-Purpose Computation on GPUs Randy Fernando NVIDIA Developer Technology Group (Original Slides Courtesy of Mark Harris) Why GPGPU? The GPU has evolved into an extremely flexible and powerful

More information

Console Architecture. By: Peter Hood & Adelia Wong

Console Architecture. By: Peter Hood & Adelia Wong Console Architecture By: Peter Hood & Adelia Wong Overview Gaming console timeline and evolution Overview of the original xbox architecture Console architecture of the xbox360 Future of the xbox series

More information

Computer Graphics. Anders Hast

Computer Graphics. Anders Hast Computer Graphics Anders Hast Who am I?! 5 years in Industry after graduation, 2 years as high school teacher.! 1996 Teacher, University of Gävle! 2004 PhD, Computerised Image Processing " Computer Graphics!

More information

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide Workstation Applications for Windows NVIDIA MAXtreme User s Guide Software Version: 6.00.xx NVIDIA Corporation February 2004 NVIDIA MAXtreme Published by NVIDIA Corporation 2701 San Tomas Expressway Santa

More information

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Hardware Announcement ZG09-0170, dated March 31, 2009 NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Table of contents 1 At a glance 3

More information

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems Comp 410/510 Computer Graphics Spring 2016 Introduction to Graphics Systems Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware (PC with graphics card)

More information

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,

More information

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering 3. Computer Graphics Sungmin Kim http://latam.jnu.ac.kr Computer Graphics Definition Introduction Research field related to the activities that includes graphics as input and output Importance Interactive

More information

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University A little about me http://graphics.stanford.edu/~mhouston Education: UC San Diego, Computer Science BS Stanford

More information

3D Math Overview and 3D Graphics Foundations

3D Math Overview and 3D Graphics Foundations Freescale Semiconductor Application Note Document Number: AN4132 Rev. 0, 05/2010 3D Math Overview and 3D Graphics Foundations by Multimedia Applications Division Freescale Semiconductor, Inc. Austin, TX

More information

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2

More information

Graphics and Computing GPUs

Graphics and Computing GPUs C A P P E N D I X Imagination is more important than knowledge. Albert Einstein On Science, 1930s Graphics and Computing GPUs John Nickolls Director of Architecture NVIDIA David Kirk Chief Scientist NVIDIA

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac) Improved Alpha-Tested Magnification for Vector Textures and Special Effects Chris Green Valve (a) 64x64 texture, alpha-blended (b) 64x64 texture, alpha tested (c) 64x64 texture using our technique Figure

More information

How To Teach Computer Graphics

How To Teach Computer Graphics Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/

More information

Programmable Graphics Hardware

Programmable Graphics Hardware Programmable Graphics Hardware Alessandro Martinelli alessandro.martinelli@unipv.it 6 November 2011 Rendering Pipeline (6): Programmable Graphics Hardware Rendering Architecture First Rendering Pipeline

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER RAMA HOETZLEIN, DEVELOPER TECHNOLOGY, NVIDIA Data Visualizations assist humans with data analysis by representing information

More information

Advanced Rendering for Engineering & Styling

Advanced Rendering for Engineering & Styling Advanced Rendering for Engineering & Styling Prof. B.Brüderlin Brüderlin,, M Heyer 3Dinteractive GmbH & TU-Ilmenau, Germany SGI VizDays 2005, Rüsselsheim Demands in Engineering & Styling Engineering: :

More information

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY IV. Évfolyam 4. szám - 2009. december Fürjes János furjes.janos@chello.hu GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY Absztrakt/Abstract Jelen írás egy modern technológiát elemez, amely

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Version 1.1, January 2016 David J. Eck Hobart and William Smith Colleges This is a PDF version of a free, on-line book that is available at http://math.hws.edu/graphicsbook/.

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Volume Rendering on Mobile Devices. Mika Pesonen

Volume Rendering on Mobile Devices. Mika Pesonen Volume Rendering on Mobile Devices Mika Pesonen University of Tampere School of Information Sciences Computer Science M.Sc. Thesis Supervisor: Martti Juhola June 2015 i University of Tampere School of

More information

The Future Of Animation Is Games

The Future Of Animation Is Games The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director cwang@1-apple.com.tw The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores

More information

A Performance-Oriented Data Parallel Virtual Machine for GPUs

A Performance-Oriented Data Parallel Virtual Machine for GPUs A Performance-Oriented Data Parallel Virtual Machine for GPUs Mark Segal Mark Peercy ATI Technologies, Inc. Abstract Existing GPU programming interfaces require that applications employ a graphics-centric

More information

GPU Christmas Tree Rendering. Evan Hart ehart@nvidia.com

GPU Christmas Tree Rendering. Evan Hart ehart@nvidia.com GPU Christmas Tree Rendering Evan Hart ehart@nvidia.com February 2007 Document Change History Version Date Responsible Reason for Change 0.9 2/20/2007 Ehart Betarelease February 2007 ii Beta Release This

More information

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list Scan-Line Fill Can also fill by maintaining a data structure of all intersections of polygons with scan lines Sort by scan line Fill each span vertex order generated by vertex list desired order Scan-Line

More information

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE SAPPHIRE HD 6870 1GB GDDR5 PCIE The SAPPHIRE HD 6870 has a new architecture with a total of 1120 stream processors and 56 texture units delivering massively parallel computing power for graphics and other

More information

Beginning Android 4. Games Development. Mario Zechner. Robert Green

Beginning Android 4. Games Development. Mario Zechner. Robert Green Beginning Android 4 Games Development Mario Zechner Robert Green Contents Contents at a Glance About the Authors Acknowledgments Introduction iv xii xiii xiv Chapter 1: Android, the New Kid on the Block...

More information

Interactive Information Visualization using Graphics Hardware Študentská vedecká konferencia 2006

Interactive Information Visualization using Graphics Hardware Študentská vedecká konferencia 2006 FAKULTA MATEMATIKY, FYZIKY A INFORMATIKY UNIVERZITY KOMENSKHO V BRATISLAVE Katedra aplikovanej informatiky Interactive Information Visualization using Graphics Hardware Študentská vedecká konferencia 2006

More information

NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 www.nvidia.com

NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 www.nvidia.com Release 1.1 February 2003 Cg Language Toolkit ALL DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED

More information

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS ICCVG 2002 Zakopane, 25-29 Sept. 2002 Rafal Mantiuk (1,2), Sumanta Pattanaik (1), Karol Myszkowski (3) (1) University of Central Florida, USA, (2) Technical University of Szczecin, Poland, (3) Max- Planck-Institut

More information

WinFast 3D S320V. User s Manual. Leadtek Research Inc.

WinFast 3D S320V. User s Manual. Leadtek Research Inc. WinFast 3D S320V User s Manual WinFast 3D S320V User s Manual i Table Of Contents Getting Started... I Accessories... I System Requirements... I Connection Guide...II Chapter 1 Welcome to WinFast 3D S320V...

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

High Dynamic Range and other Fun Shader Tricks. Simon Green

High Dynamic Range and other Fun Shader Tricks. Simon Green High Dynamic Range and other Fun Shader Tricks Simon Green Demo Group Motto If you can t make it good, make it big. If you can t make it big, make it shiny. Overview The OpenGL vertex program and texture

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

In the early 1990s, ubiquitous

In the early 1990s, ubiquitous How GPUs Work David Luebke, NVIDIA Research Greg Humphreys, University of Virginia In the early 1990s, ubiquitous interactive 3D graphics was still the stuff of science fiction. By the end of the decade,

More information

Intel Graphics Media Accelerator 900

Intel Graphics Media Accelerator 900 Intel Graphics Media Accelerator 900 White Paper September 2004 Document Number: 302624-003 INFOMATION IN THIS DOCUMENT IS POVIDED IN CONNECTION WITH INTEL PODUCTS. NO LICENSE, EXPESS O IMPLIED, BY ESTOPPEL

More information

Data Sheet. Desktop ESPRIMO. General

Data Sheet. Desktop ESPRIMO. General Data Sheet Graphic Cards for FUJITSU Desktop ESPRIMO FUJITSU Desktop ESPRIMO are used for common office applications. To fulfill the demands of demanding applications, ESPRIMO Desktops can be ordered with

More information

GPU Programming Guide. GeForce 8 and 9 Series

GPU Programming Guide. GeForce 8 and 9 Series GPU Programming Guide GeForce 8 and 9 Series December 19, 2008 1 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,

More information

Deconstructing Hardware Usage for General Purpose Computation on GPUs

Deconstructing Hardware Usage for General Purpose Computation on GPUs Deconstructing Hardware Usage for General Purpose Computation on GPUs Budyanto Himawan Dept. of Computer Science University of Colorado Boulder, CO 80309 Manish Vachharajani Dept. of Electrical and Computer

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan Vulkan on NVIDIA GPUs Piers Daniell, Driver Software Engineer, OpenGL and Vulkan Who am I? Piers Daniell @piers_daniell Driver Software Engineer - OpenGL, OpenGL ES, Vulkan NVIDIA Khronos representative

More information

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006 Interactive Rendering In The Post-GPU Era Matt Pharr Graphics Hardware 2006 Last Five Years Rise of programmable GPUs 100s/GFLOPS compute 10s/GB/s bandwidth Architecture characteristics have deeply influenced

More information

Instructor. Goals. Image Synthesis Examples. Applications. Computer Graphics. Why Study 3D Computer Graphics?

Instructor. Goals. Image Synthesis Examples. Applications. Computer Graphics. Why Study 3D Computer Graphics? Computer Graphics Motivation: Why do we study 3D Graphics? http://www.cs.ucsd.edu/~ravir Instructor http://www.cs.ucsd.edu/~ravir PhD Stanford, 2002. PhD thesis developed Spherical Harmonic Lighting widely

More information

CUDA. Multicore machines

CUDA. Multicore machines CUDA GPU vs Multicore computers Multicore machines Emphasize multiple full-blown processor cores, implementing the complete instruction set of the CPU The cores are out-of-order implying that they could

More information

Interactive visualization of multi-dimensional data in R using OpenGL

Interactive visualization of multi-dimensional data in R using OpenGL Interactive visualization of multi-dimensional data in R using OpenGL 6-Monats-Arbeit im Rahmen der Prüfung für Diplom-Wirtschaftsinformatiker an der Universität Göttingen vorgelegt am 09.10.2002 von Daniel

More information