BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA



Similar documents
Optimizing AAA Games for Mobile Platforms

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Introduction to GPGPU. Tiziano Diamanti

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Performance Optimization and Debug Tools for mobile games with PlayCanvas

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

OpenGL Performance Tuning

NVFX : A NEW SCENE AND MATERIAL EFFECT FRAMEWORK FOR OPENGL AND DIRECTX. TRISTAN LORACH Senior Devtech Engineer SIGGRAPH 2013

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Dynamic Resolution Rendering

Radeon HD 2900 and Geometry Generation. Michael Doggett

How To Teach Computer Graphics

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Computer Graphics Hardware An Overview

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

GPU Architecture. Michael Doggett ATI

iz3d Stereo Driver Description (DirectX Realization)

GPGPU Computing. Yong Cao

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Optimization for DirectX9 Graphics. Ashu Rege

Introduction to WebGL

Choosing a Computer for Running SLX, P3D, and P5

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

A Hybrid Visualization System for Molecular Models

QCD as a Video Game?

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Lecture 1 Introduction to Android

Wiggins/Redstone: An On-line Program Specializer

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

L20: GPU Architecture and Models

Tutorial 9: Skeletal Animation

Developer Tools. Tim Purcell NVIDIA

Writing Applications for the GPU Using the RapidMind Development Platform

Experiences with 2-D and 3-D Mathematical Plots on the Java Platform

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

NVIDIA GeForce GTX 580 GPU Datasheet

Rakudo Perl 6 on the JVM. Jonathan Worthington

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

How To Write A Cg Toolkit Program

GPU Profiling with AMD CodeXL

Press Briefing. GDC, March Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group Page 1

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

XBOX Performances XNA. Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007

Advanced Graphics and Animations for ios Apps

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Advanced Rendering for Engineering & Styling

LLVM for OpenGL and other stuff. Chris Lattner Apple Computer

Making Dreams Come True: Global Illumination with Enlighten. Graham Hazel Senior Product Manager Sam Bugden Technical Artist

Web Based 3D Visualization for COMSOL Multiphysics

Broken Shard. Alpha Report. Benjamin Schagerl, Dominik Dechamps, Eduard Reger, Markus Wesemann. TUM Computer Games Laboratory

Binary search tree with SIMD bandwidth optimization using SSE

Beyond 2D Monitor NVIDIA 3D Stereo

Several tips on how to choose a suitable computer

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

1. Computer System Structure and Components

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

AMD RenderMonkey IDE Version 1.71

ANDROID BASED MOBILE APPLICATION DEVELOPMENT and its SECURITY

Application. EDIUS and Intel s Sandy Bridge Technology

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

Lesson 06: Basics of Software Development (W02D2

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Overview of CS 282 & Android

Jonathan Worthington Scarborough Linux User Group

Mark Bennett. Search and the Virtual Machine

Finger Paint: Cross-platform Augmented Reality

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

nvfx: A New Shader-Effect Framework for OpenGL, DirectX and Even Compute Tristan Lorach ( tlorach@nvidia.com )

Embedded Operating Systems in a Point of Sale Environment. White Paper

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Lecture Notes, CEng 477

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

Programmable Graphics Hardware

OpenGL & Delphi. Max Kleiner. 1/22

Introduction to GPU hardware and to CUDA

Computer Graphics on Mobile Devices VL SS ECTS

CS 378: Computer Game Technology

JavaFX Session Agenda

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Full and Para Virtualization

Python, C++ and SWIG

Transcription:

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA March 20, 2014

About Us Nick Penwarden Lead Graphics Programmer Epic Games @nickpwd on Twitter Mathias Schott Developer Technology Engineer NVIDIA Evan Hart Principal Engineer - NVIDIA

Unreal Engine 4 Latest version of the highly successful UnrealEngine Long, storied history Dozens of Licensees Hundreds of titles Built to scale Mobile phones and tablets Laptop computers Next-gen Console High-end PC

Why OpenGL? The only choice on some platforms: Mac OS X ios Android Linux HTML5/WebGL Access to new hardware features not tied to OS version. Microsoft has been tying new versions of Direct3D to new versions of Windows. OpenGL frees us of these restrictions: D3D 11.x features on Windows XP!

Porting UE4 to OpenGL UE4 is a cross platform game engine. We run on many platforms today (8 by my count!) with more to come. Extremely important that we can develop features once and have them work across all platforms!

(Very) High Level Look at UE4 Game Engine World PrimitiveComponents LightComponents Etc. Renderer Scene PrimitiveSceneInfo LightSceneInfo Etc. RHI Shaders Textures Vertex Buffers Etc. CROSS PLATFORM PLATFORM DEPENDENT

Render Hardware Interface Largely based on the D3D11 API Resource management Shaders, textures, vertex buffers, etc. Commands DrawIndexedPrimitive, Clear, SetTexture, etc.

OpenGL RHI Much of the API maps in a very straight-forward manner. RHICreateVertexBuffer -> glgenbuffers + glbufferdata RHIDrawIndexedPrimitive -> gldrawrangeelements What about versions? 4.x vs 3.x vs ES2 vs ES3 What about extensions? A lot of code can be shared between versions, even ES2 and GL4.

OpenGL RHI Use a class hierarchy with static functions. Each platform ultimately defines its own, typedefs it as FOpenGL. Optional functions defined here, e.g. FOpenGL::TexStorage2D() Depending on platform could be unimplemented, gltexstorage2dext, gltexstorage2d, Optional features have a corresponding SupportsXXX() function. E.g. FOpenGL::SupportsTexStorage(). If support depends on an extension, we parse the GL_EXTENSIONS string at boot time and return that. If support is always (or never) available for the platform we return a static true or false. This allows the compiler to remove unneeded branches and dead code. Allows us to share basically all of our OpenGL code across ES2, GL3, and GL4!

What about shaders? That other stuff was easy. Well, not exactly, Evan and Mathias will explain why later. We do not want to write our shaders twice! We have a large, existing HLSL shader code base. Most of our platforms support an HLSL-like shader language. We want to be able to compile and validate our shader code offline. Need reflection to create metadata used by the renderer at runtime. Which textures are bound to which indices? Which uniforms are used and thus need to be computed by the CPU?

What are our options? Can we just compile our HLSL shaders as GLSL with some magical macros? No. Maybe for really simple shaders but the languages really are different in subtle ways that will cause your shaders to break in OpenGL. So use FXC to compile the HLSL and translate the bytecode! Can work really well! But not for UE4: we support Mac OS X as a full development platform. Any good cross compilers out there? None that support shader model 5 syntax, compute, etc. At least none that we could find two years ago :)

HLSLCC Developed our own internal HLSL cross compiler. Inspired by glsl-optimizer: https://github.com/aras-p/glsl-optimizer Started with the Mesa GLSL parser and IR: http://www.mesa3d.org/ Modified the parser to parse shader model 5 HLSL instead of GLSL. Wrote a Mesa IR to GLSL converter, similar to that used in glsloptimizer. Mesa includes IR optimization: this is an optimizing compiler!

Benefits of Cross Compilation Write shaders once, they work everywhere. Contain platform specific workarounds to the compiler itself. Offline validation and reflection of shaders for OpenGL. Treat generated GLSL that doesn t work with a GL driver as a bug in the cross compiler. Simplifies the runtime, especially resource binding and uniforms. Makes mobile preview on PC easy!

GLSL Shader Pipeline MCPP* Shader Compiler Input Parse HLSL -> Abstract Syntax Tree Convert AST -> Mesa IR Convert IR for HLSL entry point to GLSL compatible main() Optimize IR Pack uniforms Optimize IR HLSLCC Convert IR -> GLSL + parameter map Shader Compiler Output GLSL Source Parameter Map HLSL HLSL Source HLSL Source HLSL Source Source #defines Compiler Flags * http://mcpp.sourceforge.net/

Resource Binding When the OpenGL RHI creates the GLSL programs, setting up bind points is easy: for (int32 i=0; i < NumSamplers; ++i) { Location = glgetuniformlocation(program, _psi ); gluniform1i(location,i); } Renderer binds textures to indices. RHISetShaderTexture(PixelShader,LightmapParameter,LightmapTexture); Same as every other platform. (from parameter map)

Uniform Packing All global uniforms are packed in to arrays. Simplifies binding uniforms when creating programs: We don t need the strings of the uniform s names, they are fixed! glgetuniformlocation( _pu0 ) Simplifies shadowing of uniforms when rendering: RHISetShaderParameter(Shader,Parameter,Value) Copy value to shadow buffer, just like the D3D11 global constant buffer. Simplifies setting of uniforms in the RHI: Track dirty ranges. For each dirty range: gluniform4fv( ) Easy to trade off between the quantity of gluniform calls and the amount of bytes uploaded per draw call.

Uniform Buffer Emulation (ES2) Straight-forward extension of uniform packing. Rendering code gets to treat ES2 like D3D11: it packages uniforms in to buffers based on the frequency at which uniforms change. The cross compiler packs used uniforms in to the uniform arrays. Also provides a copy list: where to get uniforms from and where they go in the uniform array.

UV Space Differences 0,0 +1,+1 Direct3D UV Space 1,1 1,1-1,-1 OpenGL UV Space Normalized Device Coordinates 0,0

From the Shader s Point of View 0,0 +1,+1 Direct3D UV Space 1,1 0,0-1,-1 OpenGL UV Space Normalized Device Coordinates 1,1

Rendering Upside Down Solutions? Could sample with (U, 1 - V) Could render and flip in NDC (X,-Y,Z) We choose to render upside down. Fewer choke points: lots of texture samples in shader code! You have to transform your post-projection vertex positions anyway. D3D expects Z in [0,1], OpenGL expects Z in [-1,1]. Can wrap all of this in to a single patched projection matrix for no runtime cost. You also have to reverse your polygon winding when setting rasterizer state! BUT: Remember not to flip when rendering to the backbuffer!

After Patching the Projection Matrix 0,0 +1,+1 Direct3D UV Space 1,1 0,0-1,-1 OpenGL UV Space Normalized Device Coordinates 1,1

Next Evan and Mathias will talk about extending the GL RHI to a full D3D11 feature set. Also, how to take all of this stuff and make it fast!

Achieving DX11 Parity Enable advanced effects Not a generic DX11 layer Only handle engine needs 60% shader / 40% engine Evolves over time with engine OpenGL API

From OpenGL 3.x to 4.x General Shader upgrades Tessellation Compute shaders Unordered Access Views Lots of extensions And API enchancements but Maintain Core API Compatibility ES2, GL3, GL4 etc

General Shader Upgrades Shared Memory Too many to discuss Most straight forward Some ripple in HLSLCC Type system IR Optimization Different UBO packing rules for Arrays RWTexture* RWBuffer* Memory Barriers Shared Memory Atomics UAV Atomic Conversion and packing functions Texture arrays Integer Functions Texture Queries New Texture Functions Early Depth Test

Compute Shader API Simple analogs for Dispatch Need to manage resources No separate binding points i.e. no CSetXXX Memory behavior is different DirectX 11 hazard free OpenGL exposes hazards Enforce full coherence Guarantees correctness Optimize later //State Setup gluseprogram( ComputeShader); SetupTexturesCompute(); SetupImagesCompute(); SetupUniforms(); //Ensure reads are coherent glmemorybarrier(gl_all_barrier_bits); gldispatchcompute( X, Y, Z); //Ensure writes are coherent glmemorybarrier(gl_all_barrier_bits);

Unordered Access Views Direct3D 11 world view Multiple shader object types RWBuffer, RWTexture2D, RWStructuredBuffer, etc Unified API for setup CSSetUnorderedAccessViews OpenGL world view GL_image_load_store (Closer to Direct3D 11) GL_shader_buffer_storage_object Solution Use GL_image_load_store as much as possible Indirection table for other types

Tessellation (WIP) Both have two shader stages Direct3D11: Hull Shader & Domain Shader OpenGL: Tessellation Control Shader &Tessellation Evaluation Shader Notable divergence: HLSL Hull vs. GLSL Tessellation Control Shader HLSL has two Hull Shader functions Hull shader (per control point) PatchConstantFunction (per patch) GLSL has a single TESSELLATION_CONTROL_SHADER Explicitly handle patch constant computations

Tessellation Solution Execute TCS in phases First hull shader Next patch constants layout( vertices = 3) out; void main() { //Execute Hull shader GLSL provides barrier Ensures values are ready Patch constant function is scalar Just execute on one thread Possibly vectorize later } barrier(); if (gl_invocationid == 0) { //Execute Patch constant }

Optimization Feature complete is only half the battle No one wants pretty OpenGL 4.3 at half the performance Initial test scene < ¼ of Direct3D 90 FPS versus ~13 FPS (IIRC)

HLSLCC Optimizatons Initially heavily GPU bound Internal engine tools were fantastic Just needed to implement proper OpenGL query support Shadows were the biggest offender Filtering massively slower Projection somewhat slower Filter offsets and projection matrices turned into temp arrays GPU compilers couldn t fold through the temp array Needed to aggressively fold / propagate Reduced performance deficit to < 2x

Software Bottlenecks Remaining bottlenecks were mostly CPU Both application and driver issues Driver is very solid, but Unreal Engine 4 is a big test Heaviest use of new features This effort made the driver better Applications issues were often unintuitive Most problems were synchronizations Overall, lots of onion peeling

Application & Driver Sync s Driver likes to perform most of its work on a separate thread Often improves software performance by 50% Control panel setting to turn it off for testing Can be a hindrance with excessive synchronization Most optimization effort went into resolving these issues General process Run code with sampling profiler Look at inclusive timing results Find large hotspot on function calling into the driver Time reflects wait for other thread

Top Offenders glmapbuffer (and similar) Needs to resolve and queued glbufferdata calls Replace with malloc + glbuffersubdata glgetteximage Used with reallocating mipmap chains Replace with glcopyimagesubdata glgenxxx Every time a new object is created Replace with a cache of names quaried in a large block glgeterror Just say no, except in debug

Where are we? Infiltrator performance is at parity (<5% difference) Primitive throughput test is faster Roughly 1 ms Rendering test is slower Roughly 1.5-2.0 ms

Where do we go? More GPU optimization Driver differences Loop unrolling improvements More software optimizations Reduce object references Already implemented for UBOs Bindless Textures More textures & Less overhead BufferStorage Allows persistent mappings Reduces memory copying

Supporting Android Tegra K1 opened an interesting door Mobile chip with full OpenGL 4.4 capability Common driver code means all the same features Can we run full Unreal Engine 4 content? Yes Need to make some optimizations / compromises No different than a low-end PC Work done in NVIDIA s branch On-going work with Epic to feedback changes

What were the challenges? High resolution, modest GPU Rely on scaling Turn some features back (motion blur, bloom, etc) Aggressively optimize where feasible Modest CPU Do reflections always need to be at full resolution? Often disable cpu-hungry rendering Shadows Optimize API at every turn Modest Memory Remove any duplication