Performance Gains Achieved Through Modern OpenGL in the Siemens DirectModel Rendering Engine

Similar documents
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

OpenGL Performance Tuning

Performance Optimization and Debug Tools for mobile games with PlayCanvas

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

L20: GPU Architecture and Models

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games Mathias Schott, Evan Hart NVIDIA

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Optimizing AAA Games for Mobile Platforms

Vulkan on NVIDIA GPUs. Piers Daniell, Driver Software Engineer, OpenGL and Vulkan

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Optimization for DirectX9 Graphics. Ashu Rege

Advanced Rendering for Engineering & Styling

Advanced Graphics and Animations for ios Apps

Android and OpenGL. Android Smartphone Programming. Matthias Keil. University of Freiburg

GPU Usage. Requirements

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

GPU Architecture. Michael Doggett ATI

The Future Of Animation Is Games

How To Teach Computer Graphics

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Programming 3D Applications with HTML5 and WebGL

OctaVis: A Simple and Efficient Multi-View Rendering System

Intel Graphics Media Accelerator 900

GPU Profiling with AMD CodeXL

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Web Based 3D Visualization for COMSOL Multiphysics

Developer Tools. Tim Purcell NVIDIA

Getting Started with CodeXL

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Stream Processing on GPUs Using Distributed Multimedia Middleware

Realtime 3D Computer Graphics Virtual Reality

Computer Graphics on Mobile Devices VL SS ECTS

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Making Dreams Come True: Global Illumination with Enlighten. Graham Hazel Senior Product Manager Sam Bugden Technical Artist

Dynamic Resolution Rendering

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Beyond 2D Monitor NVIDIA 3D Stereo

Introduction to GPGPU. Tiziano Diamanti

HP Workstations graphics card options

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Radeon HD 2900 and Geometry Generation. Michael Doggett

Computer Graphics Hardware An Overview

HP Workstations graphics card options

Beginning Android 4. Games Development. Mario Zechner. Robert Green

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Introduction to Computer Graphics

3D Stereoscopic Game Development. How to Make Your Game Look

Geo-Scale Data Visualization in a Web Browser. Patrick Cozzi pcozzi@agi.com

JavaFX Session Agenda

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

BCC Multi Stripe Wipe

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc.

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

OpenGL Insights. Edited by. Patrick Cozzi and Christophe Riccio

GPGPU Computing. Yong Cao

Windows Presentation Foundation: What, Why and When

Press Briefing. GDC, March Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group Page 1

HP Z Workstations graphics card options

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Low power GPUs a view from the industry. Edvard Sørgård

An Efficient Application Virtualization Mechanism using Separated Software Execution System

INTRODUCTION TO RENDERING TECHNIQUES

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Introduction to GPU hardware and to CUDA

Windows Phone 7 Game Development using XNA

Introduction to GPU Programming Languages

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Autodesk ImageModeler 2009

Using WPF for Computer Graphics

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

What is GPUOpen? Currently, we have divided console & PC development Black box libraries go against the philosophy of game development Game

Computer Graphics. Anders Hast

VMware and NVIDIA: Bringing Workstations to the cloud

QCD as a Video Game?

Guided Performance Analysis with the NVIDIA Visual Profiler

How To Write A Cg Toolkit Program

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS

Several tips on how to choose a suitable computer

How To Use An Amd Graphics Card In Greece And (Amd) With Greege (Greege) With An Amd Greeper 2.2.

Videocard Benchmarks Over 600,000 Video Cards Benchmarked

VMware Horizon View 3D Graphics

Adobe Creative Cloud for teams

NVIDIA GeForce GTX 580 GPU Datasheet

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

Home Software Hardware Benchmarks Services Store Support Forums About Us

Transcription:

Performance Gains Achieved Through Modern OpenGL in the Siemens DirectModel Rendering Engine Jeremy Bennett [Senior Software Engineer, Siemens PLM Software] Michael Carter [Senior Key Expert, Siemens PLM Software]

DirectModel: History Developed as joint venture between EAI and HP as large model visualization in 1997 Now the graphics engine underlying all Siemens Teamcenter Visualization products Originally implemented against OpenGL 1.0 and Starbase (who remembers this?) Now pushing the envelope into OpenGL 4.5 features

DirectModel: Support Platforms: Windows, Linux, Mac, ios, Android GPUs: Nvidia Quadro & Grid, AMD FireGL & FirePro, Intel HD 4500> Support variety of OpenGL levels OpenGL 1.1 OpenGL 1.5 OpenGL 2.1 OpenGL 3.1 OpenGL 4.3 OpenGL 4.5 Vertex Buffer Objects Shaders Uniform Buffer Objects Multi Draw Elements Indirect Direct State Access

Presentation State Architecture Current architecture and how it maps to GL Pipeline Optimizations No single magic bullet but rather a whole continuum Motivated by Real World Experiences GTC S3032: Advanced SceneGraph Rendering Pipeline GTC S4379: OpenGL Scene-Rendering Techniques GDC 14: Approaching Zero Driver Overhead

State Architecture: Motivation Design priorities are flexibility, high performance, and maintainability (slightly different from a game engine; must be able to gracefully cope with unexpected situations) Previous architecture based on managing discrete OpenGL state changes incrementally New State object represents comprehensive state for rendering a single object including the geometry Important for the middleware architecture to match the underlying underlying GAPI architecture

State Architecture: Block Diagram View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform Light Parameters GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Frame State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Pass State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Light State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Shape State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Xform State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

State Architecture: Geom State View & Proj Matrices Buffer Control, Blending, etc. Light types, Lighting Model Pgon Offset, Line Style, Tex Params Model Transformation Host State Frame Pass Light Shape Xform GPU State ( UBOs, FBOs, VBOs, TexObjs ) View/Proj matrices, ModelViewProj Matrices Transparency FBO Light Parameters Material Parameters Shadow Maps Texture Environment Textures VBO Bind Points Geom Index VBO Vertex VBO

Optimization: Strategy Reduce CPU Overhead Minimize OpenGL Calls Minimize State Updates Increase GPU Performance Use faster APIs Prevent Stalls Areas of Exploration Index Display Lists VBOS Fixed Function Pipeline Shaders State Calls Uniforms Uniform Buffer Objects DrawRangeElements MultiDrawElementsIndirect CommandList Buffers Persistently Mapped Bindless

Render Optimization: Rendering Pipeline Generate Render List Use CPU or GPU Iterate over Render List Apply State Render Geometry Shape Light Xform Geom Shape Light Xform Geom apply(engine) apply(frame) while( item ) apply( Light ) apply( Shape ) apply( Xform ) render( Geom )

Optimization: Test Procedure Load model into test application Rotate model until stable state is reach Capture statistics for rotating the model 360 degree in 1 degree increments 16 Million Triangles 12,699 Occurrences

Optimization: Vertex Data Layout How are your vertices stored relative to how they are referenced? Quadro 4500 Collocation: Sorts along random axis in order to eliminate duplicated vertices Simple Fix: Sort in order of first reference Advanced Fix: Vertex Cache Optimization ( e.g. Tipsify, )

Optimization: Vertex Buffer Objects Upload vertex data to buffer on the GPU and render straight from the buffer Data on GPU does not have to match Data on CPU Similar performance as GL Display Lists Render Time FireGL 7350 (Relative to Index) Poor Performance on certain GPUs glmultidrawarrays Optimum Performance gldrawrangeelements - Triangles gldrawrangeelements - PrimRestart Performance 15x 2.6x IDX VBO VCO K2100M 65 fps 13 fps 25 fps

Optimization: Unified Vertex Buffer Objects Create VBOs of a fixed size and populate sections with data from multiple render items Significantly reduce the number of vertex bind calls Increase cache coherency of data on the GPU, especially during render Performance VBO 122 fps UVBO 155 fps 27%

Render Optimization: State Sorting Significant amount of GL calls can be attributed to applying the state updates Sorting the state and only applying if it changes allows for the number of state update to be reduced apply(engine) apply(frame) while( item ) { if ( bnewl ) apply( Light ) if ( bnews ) apply( Shape ) if ( bnewx ) apply( Xform ) Performance Unsorted 120.40 fps Sorted 161.43 fps 23% } bind(geom) render( Geom )

Optimization: Uniform Buffer Objects Still a significant amount of state to be set Shaders complicate matters as they require state passed in through uniforms Uniform buffer objects allows for large blocks of state to be uploaded to the GPU and then set using a single bind call Performance Uniforms UBO 16.49 fps 189.47 fps 11.5x

Optimization: Xform Batching GPU stalls due to data transfer can significantly impeded render performance GPU Transfers as a result of xform updates Increased concurrency as the result of batching

Optimization: MultiDrawElementsIndirect Allows for multiple draw calls to be combined into a single call Offloads traditionally CPU work to the GPU Biggest benefit will be seen by application that are CPU bound and render lots of small shapes

Optimization: MultiDrawElementsIndirect Verify your application is a good fit Use system timers to calculate system time Use glquery objects to measure GPU time Is your application CPU bound? Are there a significant number of draw calls?

Optimization: MultiDrawElementIndirect Define MDEI Buffers per State Pass xforms in through texture buffer Use the glbaseinstanceid to specify Matrix Use an additional vertex attribute with glvertexdivisor for better performance MDEI and Index Buffer created once and then bound per each state transition Xforms buffer initialized with other buffers, however the matrices are recalculated before binding Model*View Model*View*Projection

Optimization: MultiDrawElementIndirect Define MDEI Buffers per State Results in worse performance MDEI generation is expensive on both CPU and GPU Performance Orig 135.64 MDEI State 116.32 17% Draw calls are significantly reduced

Optimization: MultiDrawElementsIndirect MDEI Buffer Per Render List Significantly improves time to render on CPU 23% Performance Default 135.64 MDEI State 116.39 MDEI RL 167.44

Optimization: Summary Discussed Vertex Data Layout VBOs Unified VBOs UBOs Batching of Data Updates MDEI Future Bindless Culling CommandLists

Questions: Jeremy Bennett [Senior Software Engineer, Siemens PLM Software] jeremy.bennett@siemens.com Michael Carter [Senior Key Expert, Siemens PLM Software] michael.b.carter@siemens.com Please complete the Presenter Evaluation sent to you by email or through the GTC Mobile App. Your feedback is important!