Performance Optimization and Debug Tools for mobile games with PlayCanvas



Similar documents
Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

GPU Profiling with AMD CodeXL

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Optimizing AAA Games for Mobile Platforms

Trends in HTML5. Matt Spencer UI & Browser Marketing Manager

Web Based 3D Visualization for COMSOL Multiphysics

Low power GPUs a view from the industry. Edvard Sørgård

OpenGL Performance Tuning

Developer Tools. Tim Purcell NVIDIA

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Parallel Web Programming

Computer Graphics Hardware An Overview

How To Teach Computer Graphics

Computer Graphics on Mobile Devices VL SS ECTS

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Press Briefing. GDC, March Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group Page 1

Accelerate your Mobile Apps and Games for Android on ARM. Matthew Du Puy Software Engineer, ARM

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

GPU Architecture. Michael Doggett ATI

Dynamic Resolution Rendering

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

Advanced Rendering for Engineering & Styling

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Issues of Hybrid Mobile Application Development with PhoneGap: a Case Study of Insurance Mobile Application

A Hybrid Visualization System for Molecular Models

Programming 3D Applications with HTML5 and WebGL

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Getting Started with CodeXL

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

Geo-Scale Data Visualization in a Web Browser. Patrick Cozzi pcozzi@agi.com

Introduction to Computer Graphics

Introduction to WebGL

Color correction in 3D environments Nicholas Blackhawk

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

L20: GPU Architecture and Models

Midgard GPU Architecture. October 2014

QCD as a Video Game?

Chapter 2 - Graphics Programming with JOGL

OctaVis: A Simple and Efficient Multi-View Rendering System

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Introduction to GPGPU. Tiziano Diamanti

Crosswalk: build world class hybrid mobile apps

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Front-End Performance Testing and Optimization

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

Visualizing Data: Scalable Interactivity

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

AMD CodeXL 1.7 GA Release Notes

Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc.

Intel DPDK Boosts Server Appliance Performance White Paper

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Finding Performance and Power Issues on Android Systems. By Eric W Moore

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

Android Virtualization from Sierraware. Simply Secure

Embedded Development Tools

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Remote Graphical Visualization of Large Interactive Spatial Data

Advanced Graphics and Animations for ios Apps

Android Architecture. Alexandra Harrison & Jake Saxton

NVIDIA GeForce GTX 580 GPU Datasheet

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

An Introduction to Android

Lecture 1 Introduction to Android

Mobile Phones Operating Systems

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

<Insert Picture Here> Java, the language for the future

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

JavaFX Session Agenda

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Intel XDK для разработки кросс-платформенных мобильных приложений

Android Development: a System Perspective. Javier Orensanz

GPGPU Computing. Yong Cao

Mobile Performance: for excellent User Experience

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Web Interface using HTML5 for Interaction between Mobile Device & Cloud- Services

Vector Web Mapping Past, Present and Future. Jing Wang MRF Geosystems Corporation

Stream Processing on GPUs Using Distributed Multimedia Middleware

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

HTML5 the new. standard for Interactive Web

Update logo and logo link on A Master. Update Date and Product on B Master

Mobile Performance Management Tools Prasanna Gawade, Infosys April 2014

Empowering Developers to Estimate App Energy Consumption. Radhika Mittal, UC Berkeley Aman Kansal & Ranveer Chandra, Microsoft Research

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Operating Systems. Design and Implementation. Andrew S. Tanenbaum Melanie Rieback Arno Bakker. Vrije Universiteit Amsterdam

An Introduction to Android. Huang Xuguang Database Lab. Inha University

Outline. Operating Systems Design and Implementation. Chap 1 - Overview. What is an OS? 28/10/2014. Introduction

Transcription:

Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1

Introduction Jonathan Kirkham, ARM Worked with ARM technology and graphics at University Joined ARM in 2011 to work on 3D graphics Developing performance analysis tools and debuggers for the Mali GPUs 2

Agenda 1. Introduction to WebGL on mobile Rendering Pipeline 2. PlayCanvas experience WebGL Inspector 3. Performance analysis and debugging tools for WebGL Generic optimization tips 4. Q & A 3

Bring the Power of OpenGL ES to Mobile Browsers What is WebGL? Why WebGL? A cross-platform, royalty free web It brings plug-in free 3D to the web, standard Low-level 3D graphics API Based on OpenGL ES 2.0 A shader based API using GLSL (OpenGL Shading Language) Some concessions made to JavaScript (memory management) implemented right into the browser. Major browser vendors are members of the WebGL Working Group: 4 Apple (Safari browser) Mozilla (Firefox browser) Google (Chrome browser) Opera (Opera browser)

Introduction to WebGL How does it fit in a web browser? You use JavaScript to control it. Your JavaScript is embedded in HTML5 and uses its Canvas element to draw on. What do you need to start creating graphics? Obtain WebGLrenderingContext object for a given HTMLCanvasElement. It creates a drawing buffer into which the API calls are rendered. For example: var canvas = document.getelementbyid('canvas1'); var gl = canvas.getcontext('webgl'); canvas.width = newwidth; canvas.height = newheight; gl.viewport(0, 0, canvas.width, canvas.height); 5

WebGL Stack What is happening when a WebGL page is loaded User enters URL HTTP stack requests the HTML page Additional requests will be necessary to get JavaScript code and other resources JavaScript code will be pre-parsed while loading other assets and the DOM tree is built JavaScript code will contain calls to the WebGL API They will go back to WebKit, which calls OpenGL ES 2.0 library Shaders are compiled Textures, vertex buffers & uniforms must be loaded to the GPU Rendering can start WebKit OpenGL ES Library ARM Mali GPU Driver ARM Mali GPU Browser HTTP Stack ARM Cortex -A CPU JavaScript Engine Linux Kernel libc Hardware User Space Kernel Space 6 See Chromium Rendering Stack: http://www.chromium.org/developers/design-documents/ gpu-accelerated-compositing-in-chrome

Introducing Me 7

and PlayCanvas 8

Google Docs for Games: Realtime Collaboration 9

Game Development Goes Mobile on ARM 10

At Last: A Real Community for Game Dev 11

At Last: A Real Community for Game Dev 12

Open Sourced: https://github.com/playcanvas/engine 13

The Building of a WebGL Game: SWOOOP 14

What We Did Right We didn t use physics We didn t use realtime shadows We didn t use post effects We adopted an art style which only required low res texturing We kept the number of draw calls below 150 We added visual flare with cheap GPU based effects like particles and UV scrolling 15

What We Did Wrong We adopted an art style which generated a lot of vertex data Keeping draw calls low generated more vertex data We used realtime lighting on the environment Each gem was a separate draw call 16

Optimizing Your WebGL Code: Options Learn from the Open Source community In-browser Developer Tools GLSL Optimizer (https://github.com/aras-p/glsl-optimizer) WebGL Inspector (http://benvanik.github.io/webgl-inspector/) 17

WebGL Inspector: Function Tracing 18

WebGL Inspector: Render State 19

WebGL Inspector: Textures 20

WebGL Inspector: Vertex and Index Buffers 21

WebGL Inspector: Shader Code 22

Understanding the GPU is Key Easy to optimize your graphics pipeline on the CPU Harder to know how to optimize on the GPU Use ARM tools to get special insight into how your graphics data is being processed 23

Importance of Analysis & Debug Mobile Platforms Expectation of amazing console-like graphics and playing experience Screen resolution beyond HD Limited power budget Solution ARM Cortex CPUs and Mali GPUs are designed for low power whilst providing innovative features to keep up performance Software developers can be smart when developing apps Good tools can do the heavy lifting 24

Performance Analysis & Debug ARM DS-5 Streamline Performance Analyzer System-wide performance analysis Combined ARM Cortex Processors and Mali GPU visibility Optimize for performance & power across the system ARM Mali Graphics Debugger API Trace & Debug Tool Understand graphics and compute issues at the API level Debug and improve performance at frame level Support for OpenGL ES 1,1, 2.0, 3.0 and OpenCL 1.1 Offline Compilers Understand complexity of GLSL shaders and CL kernels Support for ARM Mali-4xx and Mali-T6xx GPU families 25

ARM DS-5 Streamline Performance Analyzer API Events CPU Activity GPU Activity Filmstrip S/W Counters H/W Counters Heatmap 26

The Basics Software based solution ICE/trace units not required Support for Linux kernel 2.6.32+ on target Eclipse plug-in or command line Lightweight sample profiling Time- or event*-based sampling Process to C/C++ source code profiler Low probe effect; <5% typically Multiple data sources CPU, GPU and Interconnect hardware counters Software counters and kernel tracepoints User defined counters and instrumented code Power/energy measurements User Space Applications & Middleware OpenGL ES gator Daemon ARM Mali GPU Drivers gator Driver TCP/IP Linux Kernel ARM Processor Target Device 27 * Event-based sampling is available on kernels 3.0 or later

ARM Mali Graphics Debugger Frame Outline Frame Statistics Framebuffer / Render Targets API Trace States Uniforms Vertex Attributes Buffers Shader View Dynamic Help Textures 28

Main Bottlenecks (1) The frame rate of a particular WebGL application could be limited by: CPU Vertex Processing Fragment Processing Bandwidth CPU Memory Vertices Textures Uniforms Fortunately we have tools to understand which one is the culprit Vertices Uniforms Triangles Varyings Textures Uniforms Varyings Pixels Vertex Processing Fragment Processing 29

Main Bottlenecks (2) CPU Too many draw calls Complex physics Vertex processing Too many vertices Too much computation per vertex CPU Memory Vertices Textures Uniforms Fragment processing Too many fragments, overdraw Too much computation per fragment Vertices Uniforms Triangles Varyings Textures Uniforms Varyings Pixels Bandwidth Big and uncompressed textures High resolution framebuffer Vertex Processing Fragment Processing 30

Frame Rendering Time Synchronous Rendering // THIS DOES NOT MEASURE GPU RENDERING var start = new Date().getTime(); gl.drawelements(gl.triangle, ); var time = new Date().getTime() - start; Deferred Rendering // THIS FORCES SYNCHRONOUS RENDERING // (BAD PRACTICE) var start = new Date().getTime(); gl.drawelements(gl.triangle, ); gl.finish(); // or gl.readpixels var time = new Date().getTime() - start; 31

Workflow Detailed analysis ARM DS-5 Streamline Performance Analyzer What kind of problem do we have? Validate ARM Mali Graphics Debugger What s causing the problem? How can I fix it? 32

33 Fragment Bound

ARM DS-5 Streamline: Fragment Bound Involves just 1 counter and the frequency of the GPU Job Slot 0 Active Fragment Percentage = (Job Slot 0 active / Frequency) *100 Fragment Percentage = 84% Overdraw = Fragment Threads Started * Number of Cores / Resolution * FPS Overdraw = 3.9 34

Fragment Bound Resolution too high or too many effects or cycles in the shader Every light and effect that you add will add to the number of cycles your shader will take If you decide to run your app at native resolution be careful Nexus 10 Native Resolution 2560 x 1600 = 4,096,000 pixels Quad Core GPU 533Mhz 520 Cycles per pixel Approx. Targeting 30 FPS 17 Cycles in your shader 35

Overdraw This is when you draw to each pixel on the screen more than once Drawing your objects front to back instead of back to front reduces overdraw Limiting the amount of transparency in the scene can help Overdraw 36

ARM Mali Graphics Debugger: Overdraw 2x 4x 1x 37

ARM Mali Graphics Debugger: Shader Map & Fragment Count Total Cycles Per Program 1% Program 54 28% Program 60 71% 38 Others

Shader Optimization Depending on the arithmetic workload, we could reduce the number of uniforms and varyings and calculate them on-the-fly Reduce their size Reduce their precision: all the varyings, uniforms and local variables are highp, is that really necessary? Use the ARM Mali Offline Shader Compiler! http://malideveloper.arm.com/develop-for-mali/ tools/analysis-debug/mali-gpu-offline-shadercompiler/ 39

General Tips Fragment Bound Render to a smaller framebuffer This will upscale the rendered frame to the size of the HTML canvas Move computation from the fragment to the vertex shader (use HW interpolation) Drawing your objects front to back instead of back to front reduces overdraw Reduce the amount of transparency in the scene 40

41 Vertex Bound

ARM DS-5 Streamline: Vertex Bound Involves just 1 counter and the frequency of the GPU Job Slot 1 Active Vertex Percentage = (Job Slot 1 active / Frequency) *100 Vertex Percentage = 13% 42

ARM Mali Graphics Debugger: Vertices Count Analyze the trace in Mali Graphics Debugger Find the draw calls with a high number of vertices Shader Statistics Find the vertex shaders with a high number of instructions 43

ARM Mali Graphics Debugger: Frame Capture 44

General Tips Vertex Bound Get your artist to remove unnecessary vertices LOD switching Only objects near the camera need to be in high detail Use culling Too many cycles in the vertex shader 45

46 Bandwidth Bound

ARM DS-5 Streamline: Bandwidth Counters Involves just 2 Streamline Counters External Bus Read Beats External Bus Write Beats Bandwidth = 967 MB/S Bandwidth in Bytes = (External Bus Read Beats + External Bus Write Beats) * Bus Width 47

ARM Mali Graphics Debugger: Textures Save memory and bandwidth with texture compression The current most popular format is ETC Texture Compression But ASTC (Adaptive Scalable Texture Compression) can deliver < 1 bit/pixel Sort by size and format Texture Weight by Dimension MB 20 15 10 5 0 Total ~40 MB 2048 x 2048 1024 x 2048 1024 x 1024 512 x 512 512 x 256 256 x 256 128 x 128 48 With texture compression, the total amount could be just ~6.4 MB (ASTC 5x5 blocks)

ARM Mali Graphics Debugger: Vertex Buffer Objects Using Vertex Buffer Objects (VBOs) can save you a lot of time in overhead Every frame in your application, all of your vertices and colour information will get sent to the GPU A lot of the time these won t change. So there is no need to keep sending them Would be a much better idea to cache the data in graphics memory 49

General Tips Bandwidth Bound Use texture compression Enable texture mipmapping Reduce the number of vertices and varyings Interleave vertices, normals, texture coordinates Reduce the size of the textures Use VBOs This will also cause a better cache utilization. 50

51 CPU Bound

CPU Bound Streamline Easy just look at the CPU Activity Remember to look at all the cores. Some of the area is greyed out due to Streamline s ability to present per process CPU activity 52

General Tips CPU Bound Sometimes a slow frame rate can actually be a CPU issue and not a GPU one In this case optimizing your graphics won t achieve anything Most mobile devices have more than one core these days Are you threading your application as much as possible? Mali GPU is a deferred architecture Reduce the amount of draw calls you make Try to combine your draw calls together 53

Summary Covered today: Introduction to WebGL PlayCanvas Profiling with ARM DS-5 Streamline Debugging with the ARM Mali Graphics Debugger ARM: www.malideveloper.arm.com www.ds.arm.com www.community.arm.com WebGL & PlayCanvas: http://www.khronos.org/webgl/ https://playcanvas.com https://github.com/playcanvas/engine http://swooop.playcanvas.com https://playcanvas.com/playcanvas/swooop 54

Thank You Any Questions? The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 55