Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

Similar documents
How To Build An Engine 4 Mobile Graphics On Anarm V8-A (A64)

Midgard GPU Architecture. October 2014

Optimizing AAA Games for Mobile Platforms

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

GPU Architecture. Michael Doggett ATI

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Making Dreams Come True: Global Illumination with Enlighten. Graham Hazel Senior Product Manager Sam Bugden Technical Artist

Low power GPUs a view from the industry. Edvard Sørgård

Radeon HD 2900 and Geometry Generation. Michael Doggett

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Computer Graphics Hardware An Overview

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

L20: GPU Architecture and Models

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

The Future Of Animation Is Games

High Performance or Cycle Accuracy?

Introduction to GPGPU. Tiziano Diamanti

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Advanced Rendering for Engineering & Styling

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Touchstone -A Fresh Approach to Multimedia for the PC

QCD as a Video Game?

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

OpenEXR Image Viewing Software

Hardware design for ray tracing

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

OpenGL Performance Tuning

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

How To Teach Computer Graphics

Introduction to GPU Architecture

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc.

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

Computer Graphics on Mobile Devices VL SS ECTS

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

GPGPU Computing. Yong Cao

Introduction to Game Programming. Steven Osman

NVIDIA GeForce GTX 580 GPU Datasheet

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Advanced Visual Effects with Direct3D

Image Synthesis. Transparency. computer graphics & visualization

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Web Based 3D Visualization for COMSOL Multiphysics

Writing Applications for the GPU Using the RapidMind Development Platform

3D Computer Games History and Technology

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Architectures, Processors, and Devices

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

Next Generation GPU Architecture Code-named Fermi

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Intel Graphics Media Accelerator 900

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Trends in HTML5. Matt Spencer UI & Browser Marketing Manager

INTRODUCTION TO RENDERING TECHNIQUES

GPGPU: General-Purpose Computation on GPUs

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

Amazing renderings of 3D data... in minutes.

Dynamic Resolution Rendering

Introduction to GPU Programming Languages

System requirements for Autodesk Building Design Suite 2017

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Introduction to GPU hardware and to CUDA

Application Performance Analysis of the Cortex-A9 MPCore

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Why You Need the EVGA e-geforce 6800 GS

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

SAPPHIRE HD GB GDDR5 PCIE.

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

HP SkyRoom Frequently Asked Questions

Thread level parallelism

Transcription:

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture Ray Hwang, Segment Marketing Manager, ARM Jack Porter, Engine Development Lead, Epic Games Korea Hessed Choi, Senior Field Applications Engineer, ARM 1

Agenda Programming for ARM v8-a Technology ARM Mali GPU Architecture Unreal Engine 4 Case Study: Moon Temple Enlighten in Unreal Engine 4 2

Programming for ARMv8-A Technology Ray Hwang Segment Marketing Manager, ARM 3

ARM Architecture Evolution Virtualization ARMv4 ARM7TDMI ARM926EJ ARM1176 Cortex -A9 Cortex-A57 4 1995 2005 2015

ARMv8-A AArch32 Maintaining compatibility AArch32 maintains full-compatibility with ARMv7 while addressing emerging software trends AArch32: evolution of 32-bit Enhanced floating point support (IEE754-2008) Ideal for concurrent programming C11, C++ 11, Java5 More efficient, high-performance thread-safe software Cryptography support (AES, Sha-1, Sha-256) Applications and software ARMv7-A AArch32 A32+T32 ARMv7-A Compatible CRYPTO Scalar FP Advanced SIMD ARMv8-A AArch64 A64 5

ARMv8-A Architecture Designed for efficiency Design 64-bit architecture Increased number and size of general purpose registers Large Virtual Address Space Efficient 32-bit/64-bit architecture Double the number and size of NEON registers Cryptography support Why it Matters Efficient access to large datasets Gains in performance and code efficiency 1. Applications not limited to 4GB memory 2. Large memory mapped files handled efficiently 1. Common software architecture (phone, tablet, clamshell) 2. A single software model across the entire portfolio Enhanced capacity of SIMD multimedia engine 1. Over10x software encryption performance 2. New security models for consumer and enterprise 6

Multi-core ARM big.little Technology Taking advantage of parallelism Platform trending toward multi-cores Single thread performance improvements diminishing Thermally constrained use cases are now commonplace Production differentiation via different CPU combinations Modern OSs are supporting multi-core How to exploit parallelism. In the core API-level parallelization Multi-thread programming ARM NEON tech/simd OpenMP, Renderscript, OpenCL, etc. Never easy, but increasingly necessary 7

Mali GPU Architecture Hessed Choi Senior FAE, ARM 8

Mali GPU High-Level Architecture A breakdown of the Mali-T880 Distributes tasks to shader cores Efficient mapping of geometry to tiles Addresses translation and protection Configurable cache shared among all shader cores Maintains cache coherency between different processors 9

Shader Core Architecture Compute Thread Creator Rasterizer Triangle Setup Unit Tiler Data Structures Early Z Thread Execution Tri Pipe Thread Execution Tri Pipe Thread Issue Compute Data and Results Reg file Arith / LUT / Branch Reg file Arith / LUT / Branch Load / Store / Varying Texturing Z/Stencil Buffer Textures Thread Completion Late Z Blender Tile Buffers Tile Buffers Frame Buffer 10

Tri-pipe Architecture 11 Unified shader architecture Fragment and vertex shaders Compute shaders Very high throughput graphics Multiple parallel pipelines Three low-latency arithmetic pipes (in case of Mali-T880) 256 simultaneous threads Low-latency for computation

Deferred Shading Popular technique in PC and console games Very memory bandwidth intensive Traditionally not a good fit for mobile Diffuse (RGBA8) Depth (D32F) Normals (RGBA8) 12

The Tilebuffer Mali-T600 Series GPU Tile-based rendering 16x16 tile size Fast on-chip memory 16 bytes of per-pixel color data Raw bit access Tilebuffer pixel Depth Stencil 128-bit pixel data Sample Sample More recent GPU architectures allow more flexible tile sizes and open up more per-pixel color data Sample Sample 13

Exposing the Tilebuffer Shader Framebuffer Fetch Access previous fragment color, depth and stencil Programmable blending, soft particles, etc. Shader Pixel Local Storage (PLS) 14

Pixel Local Storage (PLS) Exposed as EXT_shader_pixel_local_storage Per-pixel scratch memory available to fragment shaders Automatically discarded once a tile is fully processed No impact on external memory bandwidth Shader declares an interface block of PLS memory Re-interpret PLS between different passes Can have separate input and output views Independent of framebuffer format 15

Pixel Local Storage Shader example pixel_localext FragDataLocal { layout(r32f) highp float_value; layout(r11f_g11f_b10f) mediump vec3 normal; layout(rgb10_a2) highp vec4 color; layout(rgba8ui) mediump uvec4 flags; } pls; See the extension spec for more information! https://www.khronos.org/registry/gles/extensions/ext/ext_shader_pixel_local_storage.txt http://malideveloper.arm.com 16

Pixel Local Storage - Pipelines Rendering pipeline changes slightly when PLS is enabled Writing to PLS bypasses blending Note Fragment order Fragment tests still apply PLS and color share the same memory location Memory Position data Varyings Textures Uniforms Tile execution Primitive Setup Rasterization Fragment shading Blending PLS/color Tilebuffer Framebuffer Writeback 17

Why Pixel Local Storage? An alternative approach is to use multiple render targets (MRT) with framebuffer fetch if the driver can prove that render targets are not used later, it can avoid the write-back PLS is more explicit than MRT Harder for the application to get it wrong Driver doesn t have to make guesses PLS is more flexible Re-interpret PLS data between fragment shader invocations Not limited to OpenGL ES 3.x framebuffer formats 18

Unreal Engine 4 Jack Porter Engine Development Lead, Epic Games Korea 19

PSNR (db) Compress, Compress, Compress! ASTC = Adaptive Scalable Texture Compression Texture compression standard developed by ARM, adopted by Khronos KHR_texture_compression_astc_ldr for OpenGL ES and Open GL Increased quality and fidelity at low bit-rates Expansive range of input formats offers complete flexibility Choice of base format, 2D and 3D plus addition of HDR formats 50 45 40 35 30 25 8 5.12 3.56 2 1.28 0.89 20 Compression Rate (bpp)

Input Color Formats Input bits/pixel Compression in the Pre-ASTC World HDR RGB+A BC6 HDR RGBA HDR XY+Z All Major Players 64 48 HDR RGB HDR X+Y RGB+A PVRTC PVRTC ETC, BC2 BC3, BC7 48 32 32 RBGA XY+Z ETC, BC1 BC7 32 24 RGB HDR L ETC, BC5 24 16 X+Y LA ETC, BC4 16 16 L 8 1 2 3 4 5 6 7 8 Compressed bits/pixel 21

Input Color Formats Input bits/pixel ASTC Choices All ASTC HDR RGB+A HDR RGBA HDR XY+Z HDR RGB HDR X+Y RGB+A RBGA XY+Z RGB HDR L X+Y LA L 64 48 48 32 32 32 24 24 16 16 16 8 1 2 3 4 5 6 7 8 Compressed bits/pixel 22

ASTC for Mobile Games ASTC is widely supported by all major hardware vendors It s free to use Finally a good texture format that can work everywhere! Avoids separate SKUs per hardware manufacturer: PVRTC, ATC, DXT, <supports-gl-texture android:name="gl_amd_compressed_atc_texture" /> Support for ASTC is also required by Google s Android Extension Pack GL_ANDROID_extension_pack_es31a 23

ASTC Support in Unreal Engine 4 24

Game Texture Comparison 2048x2048 RGB Normal Map, with mips 17 MB uncompressed Original: 17 MB ETC: 3 MB ASTC 6x6: 2.5 MB 25

Game Texture Comparison Same texture zoomed in for Truth Original: 17 MB ETC: 3 MB ASTC 6x6: 2.5 MB 26

Unreal Engine 4 Framebuffer Fetch Unreal Engine 4.9 makes use of Mali s support for efficiently reading the existing color and depth values while rendering for a number of rendering features GL_ARM_shader_framebuffer_fetch GL_ARM_shader_framebuffer_fetch_depth_stencil 27 Deferred Player Shadows Deferred Decals RGBE HDR color blending

Unreal Engine 4 Demo: Moon Temple Made specifically for ARM Unreal Engine 4 Goals: 64-bit Android ASTC PLS 28

Unreal Engine 4 Pixel Local Storage Read & write custom data for each pixel E.g. depth and HDR color Blend particles softly against the background 29

Unreal Engine 4 Pixel Local Storage PLS Disabled PLS Enabled 30

Moon Temple Demo 31

Enlighten in Unreal Engine 4 Ray Hwang Segment Marketing Manager 32

The Global Illumination Challenge Direct illumination only takes into account light coming directly from the source Global illumination computes the way this light is absorbed, reflected, scattered and refracted across various surfaces in a scene Direct illumination Indirect lighting Global illumination Traditional methods to achieve this effect: Addition of light sources Offline lightmap bake 33

The Enlighten Solution The industry s most advanced global illumination technology Enlighten pre-bakes the location of static geometry in scene Everything else is dynamic at runtime Move lights Change materials Move dynamic objects Enlighten computes indirect lighting in real time. Benefits include: Higher quality Faster workflow Differentiated gameplay High scalability 34

Higher Quality in Unreal Engine 4 35

Leading Titles use Enlighten Battlefield Hardline, EA Battlefield 4, EA Plants vs Zombies: Garden Warfare, EA Need for Speed: The Run, EA Dragon Age: Inquisition, EA Quantum Conundrum, Square Enix Lords of the Fallen, CI Games Republique, Camouflaj Secret Ponchos, Switchblade Monkeys PollenVR, Mindfield Games 36

Available for Any Engine or Gaming Platform Pre-integrated into Unreal Engine Licensable as a standalone SDK for proprietary engines The real-time GI solution for Unity 5 The runtime is highly optimised and runs on many platforms PC Windows, Mac OS X, Linux Console Xbox 360, Xbox One, PlayStation 3, PlayStation 4, PlayStation Vita, Wii U Mobile Android, ios, Windows Phone 37

To Find Out More Did you check with ARM booth, to find out more about ARM Mali, ASTC, Enlighten, etc.? If not, you still have chances Mali Developer Center (http://malideveloper.arm.com/) : demos, tools, SDKs, tutorial & guides, sample codes, etc. Contact our experts at malidevelopers@arm.com : you can get help when you encounter any problems Forum (http://community.arm.com/groups/arm-maligraphics) : ask questions, discuss, debate, search for Q&As, read blogs, and do many other things 38

Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 39