Sparse Fluid Simulation in DirectX. Alex Dunn Dev. Tech. NVIDIA adunn@nvidia.com

Similar documents
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU Architecture. Michael Doggett ATI

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Segmentation on the GPU

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

NVIDIA GeForce GTX 580 GPU Datasheet

Flame On: Real-Time Fire Simulation for Video Games. Simon Green, NVIDIA Christopher Horvath, Pixar

Real-time Visual Tracker by Stream Processing

Optimizing AAA Games for Mobile Platforms

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Advanced Rendering for Engineering & Styling

Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

Dynamic Resolution Rendering

Introduction Computer stuff Pixels Line Drawing. Video Game World 2D 3D Puzzle Characters Camera Time steps

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Mark Bennett. Search and the Virtual Machine

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

REMOTE HIGH FIDELITY VISUALIZATION. May 2015 Jeremy Main, Sr. Solution Architect GRID

A Prototype For Eye-Gaze Corrected

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

Introduction to GPU Programming Languages

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

HistoPyramid stream compaction and expansion

Examples. Pac-Man, Frogger, Tempest, Joust,

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Introduction to GPU Architecture

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA

Optimization for DirectX9 Graphics. Ashu Rege

Tested product: Auslogics BoostSpeed

INTRODUCTION TO RENDERING TECHNIQUES

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Multi-GPU Load Balancing for Simulation and Rendering

Hardware in the Loop (HIL) Testing VU 2.0, , WS 2008/09

animation animation shape specification as a function of time

Data Sheet. Desktop ESPRIMO. General

Pedraforca: ARM + GPU prototype

3D Computer Games History and Technology

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Image Processing & Video Algorithms with CUDA

Texture Cache Approximation on GPUs

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Application. EDIUS and Intel s Sandy Bridge Technology

Fundamentals of Computer Science (FCPS) CTY Course Syllabus

GEFORCE 3D VISION QUICK START GUIDE

Fast Implementations of AES on Various Platforms

Caching Mechanisms for Mobile and IOT Devices

1. INTRODUCTION Graphics 2

GPU Point List Generation through Histogram Pyramids

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Advanced Visual Effects with Direct3D

Touchstone -A Fresh Approach to Multimedia for the PC

Graphic Processing Units: a possible answer to High Performance Computing?

Table of Contents. P a g e 2

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Choosing Between Electromechanical and Fluid Power Linear Actuators in Industrial Systems Design

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Several tips on how to choose a suitable computer

AN804: Gaming Performance Analysis 4GB vs 2GB Gareth Ogden, Corsair Memory Inc.

Outline. srgb DX9, DX10, XBox 360. Tone Mapping. Motion Blur

NVIDIA GeForce Experience

Logitech ConferenceCam CC3000e. Best Practices for use with Software Clients. UC for Real People

GPU Accelerated Monte Carlo Simulations and Time Series Analysis

L20: GPU Architecture and Models

Computers in Film Making

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

How To Run A Factory I/O On A Microsoft Gpu 2.5 (Sdk) On A Computer Or Microsoft Powerbook 2.3 (Powerpoint) On An Android Computer Or Macbook 2 (Powerstation) On

Brainlab Node TM Technical Specifications

Fluid Dynamics and the Navier-Stokes Equation

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Introduction to GPU hardware and to CUDA

Volume Visualization Tools for Geant4 Simulation

Speeding Up Cloud/Server Applications Using Flash Memory

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Practical Volume Rendering in mobile devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Data Sheet Graphic Cards for Fujitsu ESPRIMO PCs

Fast Software AES Encryption

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based)

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

NVIDIA VIDEO ENCODER 5.0

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Writing Applications for the GPU Using the RapidMind Development Platform

Getting Started Guide

Real-Time BC6H Compression on GPU. Krzysztof Narkowicz Lead Engine Programmer Flying Wild Hog

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Transcription:

Sparse Fluid Simulation in DirectX Alex Dunn Dev. Tech. NVIDIA adunn@nvidia.com

Agenda We want more fluid in games Eulerian (grid based) fluid. Sparse Eulerian Fluid. Feature Level 11.3 Enhancements! (Not a talk on fluid dynamics)

Why Do We Need Fluid in Games? Replace particle kinematics! more realistic == better immersion Game mechanics? occlusion smoke grenades interaction Dispersion air ventilation systems poison, smoke Endless opportunities!

Eulerian Simulation #1 My (simple) DX11.0 eulerian fluid simulation: Inject 2x Velocity Advect Pressure 2x Pressure Vorticity Evolve 1x Vorticity

Eulerian Simulation #2 Inject Advect Pressure Vorticity Evolve Add fluid to simulation Move data at, XYZ (XYZ+Velocity) Calculate localized pressure Calculates localized rotational flow Tick Simulation

**(some imagination required)**

Too Many Volumes Spoil the Fluid isn t box shaped. clipping wastage Simulated separately. authoring GPU state volume-to-volume interaction Tricky to render.

Memory (Mb) Problem! N-order problem 64^3 = ~0.25m cells 128^3 = ~2m cells 256^3 = ~16m cells Applies to: computational complexity memory requirements Texture3D - 4x16F 8192 7168 6144 5120 4096 3072 2048 1024 0 0 256 512 768 1024 Dimensions (X = Y = Z) And that s just 1 texture

Bricks Split simulation space into groups of cells (each known as a brick). Simulate each brick independently.

Brick Map Need to track which bricks contain fluid Texture3D<uint> 1 voxel per brick 0 Unoccupied 1 Occupied Could also use packed binary grids [Gruen15], but this requires atomics

Tracking Bricks Initialise with emitter Expansion (unoccupied occupied) if { V x y z > D brick } expand in that axis Reduction (occupied unoccupied) inverse of Expansion handled automatically

Sparse Simulation Clear Tiles Inject Reset all tiles to 0 (unoccupied) in brick map. Advect Pressure Vorticity Evolve* Fill List Read value from brick map. Append brick coordinate to list if occupied. Texture3D<uint> g_brickmapro; AppendStructredBuffer<uint3> g_listrw; if(g_brickmapro[idx]!= 0) { g_listrw.append(idx); } *Includes expansion

Uncompressed Storage Allocate everything; forget about unoccupied cells Pros: simulation is coherent in memory. works in DX11.0. Cons: no reduction in memory usage.

Compressed Storage Indirection Table Similar to, List<Brick> Pros: good memory consumption. works in DX11.0. Physical Memory Cons: allocation strategies. indirect lookup. software translation filtering particularly costly

1 Brick = (4) 3 = 64

1 Brick = (1+4+1) 3 = 216 New problem; 6n 2 +12n + 8 problem. Can we do better?

Enter; Feature Level 11.3 Volume Tiled Resources (VTR)! Extends 2D functionality in FL11.2 Must check HW support: (DX11.3!= FL11.3) ID3D11Device3* pdevice3 = nullptr; pdevice->queryinterface(&pdevice3); D3D11_FEATURE_DATA_D3D11_OPTIONS2 support; pdevice3->checkfeaturesupport(d3d11_feature_d3d11_options2, &support, sizeof(support)); m_usetiledresources = support.tiledresourcestier == D3D11_TILED_RESOURCES_TIER_3;

Tiled Resources #1 Pros: only mapped memory is allocated in VRAM hardware translation logically a volume texture all samplers supported 1 Tile = 64KB (= 1 Brick) fast loads

Tiled Resources #2 1 Tile = 64KB (= 1 Brick) BPP Tile Dimensions 8 64x32x32 16 32x32x32 32 32x32x16 64 32x16x16 128 16x16x16 Gotcha: Tile mappings must be updated from CPU

Latency Resistant Simulation #1 Naïve Approach: clamp velocity to V max CPU Read-back: occupied bricks. 2 frames of latency! extrapolate probable tiles. N; Data Ready N+1; Data Ready N+2; Data Ready CPU: GPU: Frame N Frame N+1 Frame N+2 Frame N+3 Frame N Frame N+1 Frame N+2 N; Tiles Mapped

Latency Resistant Simulation #2 Tight Approach: CPU Read-back: occupied bricks. max{ V } within brick. 2 frames of latency! extrapolate probable tiles. N; Data Ready N+1; Data Ready N+2; Data Ready CPU: GPU: Frame N Frame N+1 Frame N+2 Frame N+3 Frame N Frame N+1 Frame N+2 N; Tiles Mapped

Latency Resistant Simulation #3 Yes CPU Readback Ready? No Readback Brick List Prediction Engine Emitter Bricks CPU GPU Sparse Eulerian Simulation UpdateTile Mappings

Demo

Sim. Time (ms) Performance #1 64.7 19.9 2.3 0.4 1.8 2.7 2.9 6.0 Full Grid Sparse Grid 128 256 384 512 1024 Grid Resolution NOTE: Numbers captured on a GeForce GTX980

Memory (MB) Performance #2 640 2,160 80 11 46 57 83 138 Full Grid Sparse Grid 128 256 384 512 1024 Grid Resolution NOTE: Numbers captured on a GeForce GTX980

Scaling Speed ratio (1 Brick) = Time{Sparse} Time{Full} ~75% across grid resolutions.

Summary Fluid simulation in games is justified. Fluid is not box shaped! One volume is better than many small. Un/Compressed storage a viable fallback. VTRs great for fluid simulation. Other latency resistant algorithms with tiled resouces?

Questions? Alex Dunn - adunn@nvidia.com Twitter: @AlexWDunn Thanks for attending.