Next-Gen Tile-Based GPUs. Maurice Ribble

Similar documents
GPU Architecture. Michael Doggett ATI

Optimizing AAA Games for Mobile Platforms

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Computer Graphics Hardware An Overview

Performance Optimization and Debug Tools for mobile games with PlayCanvas

OpenGL Performance Tuning

Touchstone -A Fresh Approach to Multimedia for the PC

How To Teach Computer Graphics

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Radeon HD 2900 and Geometry Generation. Michael Doggett

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Low power GPUs a view from the industry. Edvard Sørgård

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Introduction to GPGPU. Tiziano Diamanti

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Developer Tools. Tim Purcell NVIDIA

Introduction to Computer Graphics

Image Synthesis. Transparency. computer graphics & visualization

L20: GPU Architecture and Models

Lecture Notes, CEng 477

Advanced Graphics and Animations for ios Apps

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Intel Graphics Media Accelerator 900

Dynamic Resolution Rendering

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction to Game Programming. Steven Osman

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

iz3d Stereo Driver Description (DirectX Realization)

Advanced Rendering for Engineering & Styling

Advanced Visual Effects with Direct3D

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

INTRODUCTION TO RENDERING TECHNIQUES

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Remote Desktop Protocol Performance

GPGPU Computing. Yong Cao

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

SAPPHIRE HD GB GDDR5 PCIE.

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

QCD as a Video Game?

3D Stereoscopic Game Development. How to Make Your Game Look

Web Based 3D Visualization for COMSOL Multiphysics

Color correction in 3D environments Nicholas Blackhawk

Midgard GPU Architecture. October 2014

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Computer Graphics. Anders Hast

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Introduction to Graphics Software Development for OMAP 2/3

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

Catalyst Software Suite Version 9.12 Release Notes

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Computer Graphics on Mobile Devices VL SS ECTS

Optimization for DirectX9 Graphics. Ashu Rege

Interactive Level-Set Deformation On the GPU

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

The Future Of Animation Is Games

Writing Applications for the GPU Using the RapidMind Development Platform

Real-Time Graphics Architecture

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

Choosing a Computer for Running SLX, P3D, and P5

Plug-in Software Developer Kit (SDK)

CMSC 427 Computer Graphics Programming Assignment 1: Introduction to OpenGL and C++ Due Date: 11:59 PM, Sept. 15, 2015

White Paper AMD PROJECT FREESYNC

Beyond 2D Monitor NVIDIA 3D Stereo

A Crash Course on Programmable Graphics Hardware

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Tutorial 9: Skeletal Animation

GPU Christmas Tree Rendering. Evan Hart

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

MICROSOFT. Remote Desktop Protocol Performance Improvements in Windows Server 2008 R2 and Windows 7

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

Multiresolution 3D Rendering on Mobile Devices

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Examples. Pac-Man, Frogger, Tempest, Joust,

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Transcription:

Next-Gen Tile-Based GPUs Maurice Ribble

The Agenda Introduction to tile based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

The Need in the Mobile Market Solution to the limited bandwidth problem Low power (better battery life) Small size (cheap) Good performance Flexible shaders

Traditional Graphics Pipeline vs TBR Pipeline TBR = Tile-Based Rendering Traditional GPUs render full scene in one pass Tiling GPUs render scene in multiple passes

Traditional Graphics Pipeline Vertex Data VS Triangle Setup & Rasterization FS

TBR Graphics Pipeline Vertex Data VS Triangle Setup & Rasterization FS Tile 1

TBR Graphics Pipeline Vertex Data VS Triangle Setup & Rasterization FS Tile 2

TBR Graphics Pipeline Vertex Data VS Triangle Setup & Rasterization FS Tile 3

TBR Graphics Pipeline Vertex Data VS Triangle Setup & Rasterization FS Tile 9

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

Who Uses TBR? Microsoft Talisman Imagination Technologies KYRO and KYRO II (Desktop PC) PowerVR CLX2 (Sega Dreamcast) PowerVR MBX (OpenGL ES 1.x) PowerVR SGX (targets OpenGL ES 2.0) AMD Imageon 2380 (OpenGL ES 1.x) Xenos (Xbox 360) Z430 and Z460 (targets OpenGL ES 2.0)

Why is TBR so Popular in Embedded Devices? Reduced bus bandwidth Saves power Allows for simpler system designs Desktop PC s brute force approach doesn t work as well in the mobile space Lower polygon counts in mobile games are an ideal match for TBR

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

Fast Local Memory Tile-based GPUs have a small amount of fast memory on chip Each tile gets rendered here then resolved to the final buffer in system memory Very high bandwidth Very low latency Eliminates need for many caches and complex compression algorithms found on desktop GPUs

Geometry Binning Could just draw the scene to each tile and let the vertices get clipped, but In the real world this costs too much vertex shader performance TBR hardware has ways of sorting triangles into bins for each tile Each hardware vendor does this differently Don t forget the driver/hardware has to batch up draw calls

Geometry Binning Bin 1 / Tile 1 Bin 2 / Tile 2 Bin 3 / Tile 3 Bin 4 / Tile 4

External Bandwidth Usage Example Draw two triangles with depth testing Each triangle is 100 pixels and there is 50 pixels of overlap Each triangle has a single texture fetch Depth and color buffers are 32 bits Textures are 32 bits (easy math)

External Bandwidth Usage Example Traditional Rendering Tile Based Rendering Texture Reads Depth Reads Depth Writes Color Writes 150*4 bytes 200*4 bytes 150*4 bytes 150*4 bytes 150*4 bytes 0 bytes 0 bytes 0 bytes Total Bandwidth 2600 bytes 600 bytes This is just for the actual rendering TBR has constant bandwidth overhead from resolves

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

Resolves Resolves are the copies between a GPU s fast internal memory and the system s slow external memory Light weight resolve Copies from fast internal memory to slow external memory Heavy weight resolve Restores a tile with a copy from slow external memory to fast internal memory, and then does a light weight resolve

Heavy Weight Resolves ** This is repeated for each tile ** GPU Fast Internal Memory (Tiles Rendered Here) Heavy Resolve Light Resolve Slow External Memory

Resolve Cases eglswapbuffers Light weight resolve glbindframebuffer Light weight resolve if you start drawing with glclear Heavy weight resolve if you don t glteximage2d, gltexsubimage2d, glbufferdata, and glbuffersubdata Drivers should prevent the resolve Mid-frame calls force driver to create an extra copy of the data Starting a frame with these calls prevents that extra copy

Resolve Cases glcopyteximage2d and glcopytexsubimage2d Heavy weight resolve OpenGL ES 2.0 has FBOs so use them glreadpixels Heavy weight resolve Please don t use this (especially in the middle of a frame) Exceeding triangle or state buffer limits Heavy weight resolve (driver and hardware specific)

Resolve Friendly Code // Put all glteximage2d, gltexsubimage2d, glbufferdata and // glbuffersubdata commands here // FBO block (optional - only needed if you are using FBOs) for (int i = 0; i < numfbos; ++i) { glbindframebuffer( target[i], framebuffer[i] ); glclear( GL_COLOR_BUFFER_BIT GL_DEPTH_BUFFER_BIT GL_STENCIL_BUFFER_BIT ); Draw your scene for each FBO here } // If needed put a glclear here (color, depth, stencil) Draw your scene to your backbuffer here // If you absolutely need a glreadpixels do it here eglswapbuffers( dsp, backbuffer );...Repeat...

A frame in the life of TBR State gldrawelements() State Bin 1 gldrawelements() State gldrawelements() Sort Triangles Into Bins State gldrawelements() Bin 2 SwapBuffer

A frame in the life of TBR Triangles in Bin1 Vertex Shader Rasterization Fragment Shader Fast Local Memory Slow External Memory Swap Triggers a Resolve Lots of memory here

A frame in the life of TBR Triangles in Bin2 Vertex Shader Rasterization Fragment Shader Fast Local Memory Slow External Memory Swap Triggers a Resolve Lots of memory here

A frame in the life of TBR Slow External Memory LCD controller displays backbuffer to screen Lots of memory here

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

OpenGL ES 2.0 Emulators Develop OpenGL ES 2.0 code before hardware is available Use the Visual Studio s build environment Graphics code should require no changes when porting to real hardware Help track down TBR performance bottlenecks

OpenGL ES 2.0 Emulators Two OpenGL ES 2.0 emulators are: AMD http://ati.amd.com/developer/tools.html Future Releases will have Performance throttling to simulate real hardware Detailed performance stats to help find bottlenecks Imagination Demo http://www.imgtec.com/powervr/insider/toolssdks/kh ronosopengles2xsgx

The Agenda Introduction to tiling based rendering Tiling is most common in mobile systems List of common tiling hardware features Resolves Explanation of the different types Optimizing code for resolves OpenGL ES 2.0 emulators Conclusion

TBR Strengths In general TBR excels at bandwidth limited operations Blending Overdraw Multisample AA

The Take Home Message Hopefully you now know what TBR is Avoid extra resolves Costs power and reduce battery life Costs performance Optimizing for TBR usually only takes a few small changes OpenGL ES 2.0 Emulators are a good way to start writing efficient code

Questions? Maurice.Ribble@amd.com devrel@amd.com