Optimizing the Unreal Engine 4 Soul Demo for Galaxy Note 10.1

Similar documents
Optimizing AAA Games for Mobile Platforms

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Midgard GPU Architecture. October 2014

GPU Architecture. Michael Doggett ATI

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

Low power GPUs a view from the industry. Edvard Sørgård

OpenGL Performance Tuning

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Computer Graphics Hardware An Overview

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

How To Build An Engine 4 Mobile Graphics On Anarm V8-A (A64)

Introduction to GPGPU. Tiziano Diamanti

Making Dreams Come True: Global Illumination with Enlighten. Graham Hazel Senior Product Manager Sam Bugden Technical Artist

Optimization for DirectX9 Graphics. Ashu Rege

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Web Based 3D Visualization for COMSOL Multiphysics

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

Radeon HD 2900 and Geometry Generation. Michael Doggett

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

GPU Christmas Tree Rendering. Evan Hart

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Developer Tools. Tim Purcell NVIDIA

3D Computer Games History and Technology

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

nvfx: A New Shader-Effect Framework for OpenGL, DirectX and Even Compute Tristan Lorach ( tlorach@nvidia.com )

Dynamic Resolution Rendering

QCD as a Video Game?

SAPPHIRE HD GB GDDR5 PCIE.

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

Computer Graphics on Mobile Devices VL SS ECTS

NVIDIA GeForce GTX 580 GPU Datasheet

How To Teach Computer Graphics

Color correction in 3D environments Nicholas Blackhawk

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

L20: GPU Architecture and Models

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

High Dynamic Range and other Fun Shader Tricks. Simon Green

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Deferred Shading & Screen Space Effects

Advanced Visual Effects with Direct3D

Android Virtualization from Sierraware. Simply Secure

Procedural Shaders: A Feature Animation Perspective

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

GPGPU Computing. Yong Cao

Advanced Rendering for Engineering & Styling

The Future Of Animation Is Games

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Touchstone -A Fresh Approach to Multimedia for the PC

Vertex and fragment programs

Introduction to Computer Graphics

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Instructor. Goals. Image Synthesis Examples. Applications. Computer Graphics. Why Study 3D Computer Graphics?

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Grid Computing for Artificial Intelligence

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

INTRODUCTION TO RENDERING TECHNIQUES

Interactive Level-Set Deformation On the GPU

Beginning Android 4. Games Development. Mario Zechner. Robert Green

AMD Processor Performance. AMD Phenom II Processors Discrete Platform Benchmarks December 2008

AMD RenderMonkey IDE Version 1.71

GPU-Driven Rendering Pipelines

Computer Graphics. Anders Hast

Programmable Graphics Hardware

Advanced Graphics and Animations for ios Apps

GPU Shading and Rendering: Introduction & Graphics Hardware

Introduction to Computer Graphics

Software Virtual Textures

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

Parallel Web Programming

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

A Crash Course on Programmable Graphics Hardware

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Transcription:

Optimizing the Unreal Engine 4 Soul Demo for Galaxy Note 10.1 Jack Porter Engine Development and Support Lead Epic Games Korea

Epic Games Founded 1991 by Tim Sweeney HQ in Cary, North Carolina Introduction Subsidiaries in Utah and Seattle, Korea, Japan, UK and Poland Jack Porter Originally from Australia, came to Korea in 2003 Founding member of Epic Games Korea 15 years of experience working on Unreal Engine

Content Unreal Engine Soul Demo How we made the demo Unreal Engine 4 pipeline Next-gen mobile rendering techniques in Unreal Engine 4

Unreal Engine 1-3 History Unreal Engine 1 (1998) PC - Unreal, Unreal Tournament Unreal Engine 2 (2002) PC online - Lineage 2 PS2/Xbox Unreal Engine 3: PC online - TERA, BLESS, Blade & Soul PS3/Xbox 360 - Gears of War series ios - Infinity Blade series Android - Blade for Kakao

Subscription model Engine and full source code available for $19/month subscription New Features Redesigned editor tool DirectX 11 / OpenGL 4 Physically based lighting and shading Deferred rendering Blueprints Unreal Engine 4 CHALLENGE: how can we bring these features to mobile? Infiltrator demo

Soul Demonstration UE4 Content Purpose: See what s possible in mobile rendering on today s high end mobile devices ios/android Mali/ImgTec/Adreno Push the limits of draw calls and driver features to drive innovation in mobile industry (benchmark) Using a regular PC content pipeline Demonstrated at GDC 2014

Soul Demo Video Game Developer ARM Multimedia Conference, Seminar, March June 25-29, 27, 2014 2013

Soul Statistics 490 MB apk size mostly textures Around 400 draw calls per frame Around 400,000 triangles

UE4 Rendering for Mobile Forward rendering instead of deferred shading Bandwidth cost for multiple render targets on ES 3.0 still too high in current devices. High quality static lighting Global Illumination captured on PC Cubemaps for image-based lighting Physically based shading model

Physically Based Shading Metallic 0 to 1 Metal with roughness 0 to 1 Non-metal with roughness 0 to 1

Physically Based Shading Same material model as PC See Real Shading in Unreal Engine 4 (Brian Karis) [1] Mobile adjustments Analytic approximation of Environment BRDF (ALU instead of TEX) Normalized Phong spec distribution (faster and still energy conserving) [1] http://blog.selfshadow.com/publications/s2013-shading-course/

Material Pipeline Artist creates shader graph in Unreal Editor tool

Material Pipeline Network graph generates HLSL functions Combined with handwritten HLSL code for our Physically Based lighting model We use this directly on PC half3 GetMaterialBaseColor(FMaterialPixelParameters Parameters) { MaterialFloat3 Local5 = (1.00000000 - Material.VectorExpressions[0].rgb); MaterialFloat4 Local6 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1, Material.Texture2D_1Sampler, Parameters.TexCoords[0].xy)); MaterialFloat3 Local7 = (Local6.rgb + MaterialFloat3(0.00000000, 0.00000000, 0.00000000)); MaterialFloat Local8 = dot(local6.rgb, MaterialFloat3(0.30000001, 0.58999997, 0.11000000)); MaterialFloat Local9 = (Local8 + -0.40000001); MaterialFloat Local10 = (Local9 * -5.00000000); MaterialFloat Local11 = min(max(local10, 0.00000000), 1.00000000); MaterialFloat Local12 = (Local8 + dot(materialfloat3(0.00000000, 0.00000000, 0.00000000), MaterialFloat3(0.30000001, 0.58999997, 0.11000000))); MaterialFloat Local13 = (dot(materialfloat3(0.00000000, 0.00000000, 0.00000000), MaterialFloat3(0.30000001, 0.58999997, 0.11000000)) / Local12); MaterialFloat Local14 = (Local13-0.50000000); MaterialFloat Local15 = (Local14 * 4.00000000); MaterialFloat Local16 = (Local15 + 0.50000000); MaterialFloat Local17 = min(max(local16, 0.00000000), 1.00000000); MaterialFloat Local18 = (Local11 * Local17); MaterialFloat3 Local19 = lerp(local6.rgb, Local7, MaterialFloat(Local18)); MaterialFloat3 Local20 = (Local19 * Local5); return Local20;; }

Material Pipeline For mobile, we crosscompile HLSL to GLSL Version and GLSL extensions enabled using preprocessor #version 300 es out mediump vec4 out_fragcolor; uniform highp samplercube ps4; uniform highp sampler2d ps0; uniform highp sampler2d ps1; uniform highp sampler2d ps2; varying highp vec4 var_texcoord0; varying highp vec4 var_texcoord1; varying highp vec4 var_texcoord2; varying vec4 var_texcoord7; void main() { vec4 t0; vec4 t1; t1.xyzw = vec4(gl_fragcoord); t0.xyzw = t1; vec4 t2; highp vec4 t3; t3.xy = var_texcoord4.zw; t3.zw = var_texcoord5.zw; vec3 t4; vec3 t5; t5.xyz = vec3(t3.xyz); t4.xyz = t5; highp vec4 t6;

Material Pipeline: Mobile Problems GLSL shaders are compiled by the device Shader compiler bugs on device can cause failures and require workaround eg Register allocation very slow or fails for some shaders Extensions we want may not be available GL_EXT_texture_lod GL_EXT_shader_framebuffer_fetch

Image Based Lighting Game Developer ARM Multimedia Conference, Seminar, March June 25-29, 27, 2014 2013

Image Based Lighting Game Developer ARM Multimedia Conference, Seminar, March June 25-29, 27, 2014 2013

Image Based Lighting Image Based Lighting captured on PC and stored in cubemap PC uses FP16 cubemaps Mobile uses Decode(RGBM)^2 cubemap encoding Mipmap choice based on material roughness Same as PC, filtering same as PC We use OpenGL ES 3.0 on T628, GL_EXT_shader_texture_lod where supported on other devices Mobile supports one infinite distance cubemap per object PC blends multiple cubemaps per surface and provides parallax correction using (texture arrays)

HDR Lighting On PC we use a 32-bit floating point framebuffer and a post-processing tonemapper On Mobile we would like to use FP16 framebufffer Not available on Mali T-628 GL_EXT_color_buffer_half_float / GL_RGBA16F_EXT supported on a few devices On Mali we use Mosaic Exposure technique to simulate a 12-bit /channel linear framebuffer on GL_RGBA8

Linear Blending at 32-bpp Used on mobile GPUs without FP16 and without srgb support to support linear blending N W P S E Supports {0.0 to 2.0} dynamic range Forward shading and linear blending with exposure mosaic based on gl_fragcoord De-mosaic in tonemapping pass {0-2} dynamic range {0-1/6} dynamic range

HDR Lighting: Directional Lightmaps Pair of compressed textures HDR color with log luma encoding World space 1 st order spherical harmonic luma directionality Optimized for Mobile and ETC/PVRTC compression No separately compressed encoding for alpha Therefore mobile encoding for HDR color is different than PC PC uses {RGB/Luma, LogLuma in Alpha} Mobile uses {RGB/Luma * LogLuma, no Alpha}

Terrain Rendering

Terrain Rendering

Terrain Rendering

Optimizations for Mali-T628: Terrain Landscape uses continuous LOD (morphing in vertex shader) To avoid duplicating data, vertices are reused for lower LOD levels Terrain divided into chunks LOD 3 (16 verts) LOD 3 (16 verts) LOD 3 (16 verts) LOD 4 (4 verts)

Vertex Buffer Layout Because the vertices are shared between LODs, at low LOD levels the indices are sparse. Index X Y 0 0 0 1 1 0 2 2 0 3 3 0 4 0 1 5 1 1 Index Buffer (LOD 3) 0 1 4 1 5 4 1 2 5 2 6 5 2 3 6 Index Buffer (LOD 4) 0 3 12 3 15 12 The Mali driver runs the vertex shader for ALL vertices between minimum and maximum vertex! 6 2 1 7 3 1 8 0 2 9 1 2 10 2 2 11 3 2 12 0 3 13 1 3 14 2 3 15 3 3 3 7 6... 9 10 13 10 14 13 10 11 14 11 15 14

Vertex Buffer Layout Simple rearrangement* of the vertex order to place the shared vertices first increased the frame from 10 to 30 FPS! Index X Y 0 0 0 1 3 0 2 0 3 3 3 3 4 1 0 5 2 0 Index Buffer (LOD 3) 0 4 6 4 7 6 4 5 7 7 8 7 5 1 8 Index Buffer (LOD 4) 0 1 3 1 3 2 6 0 1 1 8 9 We don t see this behavior on other mobile GPUs. 7 1 1 8 2 1 9 3 1... 10 0 2 11 1 2 11 12 14 12 2 2 12 15 14 *Actual layout is more complex to support more than these two LOD levels 13 3 2 14 1 3 15 2 3 12 13 15 13 3 15

UE4 Mobile: Challenges Driver differences between devices and vendors Shader compiler differences Extensions and workarounds Compile time and bugs Different performance characteristics Resource and framebuffer management Drawcall overhead Thermal throttling behavior Version fragmentation many UE4 issues are solved by doing a firmware upgrade Updates are a big problem in the Android ecosystem!

UE4 Mobile: Challenges Draw call overhead Soul demo kept drawcalls to under 400 which seems to be the upper limit This needs to get better! Memory managament Difficult to get an idea of memory available to app Soul demo used texture streaming from flash into a fixed pool, to ensure a more consistent memory footprint

Thank You! Please come and see how we authored the Soul content using the Unreal Engine 4 Editor Q & A