Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions



Similar documents
Interactive Level-Set Deformation On the GPU

Interactive Level-Set Segmentation on the GPU

GPU Architecture. Michael Doggett ATI

Computer Graphics Hardware An Overview

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

OpenGL Performance Tuning

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Performance Optimization and Debug Tools for mobile games with PlayCanvas

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Introduction to GPGPU. Tiziano Diamanti

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

QCD as a Video Game?

Developer Tools. Tim Purcell NVIDIA

The Future Of Animation Is Games

Optimizing AAA Games for Mobile Platforms

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Multiresolution 3D Rendering on Mobile Devices

Writing Applications for the GPU Using the RapidMind Development Platform

GPGPU Computing. Yong Cao

L20: GPU Architecture and Models

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

GPGPU: General-Purpose Computation on GPUs

Parallel Simplification of Large Meshes on PC Clusters

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Hardware design for ray tracing

Low power GPUs a view from the industry. Edvard Sørgård

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

Radeon HD 2900 and Geometry Generation. Michael Doggett

Employing Complex GPU Data Structures for the Interactive Visualization of Adaptive Mesh Refinement Data

A Crash Course on Programmable Graphics Hardware

GPU Point List Generation through Histogram Pyramids

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

NVIDIA GeForce GTX 580 GPU Datasheet

Advanced Rendering for Engineering & Styling

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

OctaVis: A Simple and Efficient Multi-View Rendering System

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

A Survey of General-Purpose Computation on Graphics Hardware

Binary search tree with SIMD bandwidth optimization using SSE

How To Teach Computer Graphics

Router Architectures

Getting Started with CodeXL

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

Lecture Notes, CEng 477

Image Synthesis. Transparency. computer graphics & visualization

GPU Shading and Rendering: Introduction & Graphics Hardware

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

How To Develop For A Powergen 2.2 (Tegra) With Nsight) And Gbd (Gbd) On A Quadriplegic (Powergen) Powergen Powergen 3

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Petascale Visualization: Approaches and Initial Results

Web Based 3D Visualization for COMSOL Multiphysics

3D Computer Games History and Technology

Large-Data Software Defined Visualization on CPUs

Realtime Ray Tracing and its use for Interactive Global Illumination

Introduction to GPU Architecture

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Interactive 3D Medical Visualization: A Parallel Approach to Surface Rendering 3D Medical Data

Fast cone-beam CT image reconstruction using GPU hardware

Advanced Graphics and Animations for ios Apps

Guided Performance Analysis with the NVIDIA Visual Profiler

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

Volume Rendering on Mobile Devices. Mika Pesonen

Dynamic Resolution Rendering

Computer Graphics. Computer graphics deals with all aspects of creating images with a computer

1. INTRODUCTION Graphics 2

Resource Efficient Computing for Warehouse-scale Datacenters

Rendering Microgeometry with Volumetric Precomputed Radiance Transfer

Next Generation GPU Architecture Code-named Fermi

Computer Graphics. Anders Hast

Touchstone -A Fresh Approach to Multimedia for the PC

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Chapter 2 Parallel Architecture, Software And Performance

Transcription:

Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications Challenges Memory layout - Computation - Rendering Load balancing - Available resources - Usage considerations GPU feature requests Conclusions Scientific Computing and Imaging Institute University of Utah IEEE Visualization 2003 Tutorial: Interactive Visualization of Volumetric Data on Consumer PC Hardware Why? Combine computation and visualization Accelerate computation on GPU Interactive visualization Computational steering Visual debugging Application: Surface Processing Applications 3D image and surface processing Physical simulation Implicit surface modeling Tasdizen et. al., IEEE Visualization 2002

Application: Modeling Application: Physical Simulation Museth et. al., SIGGRAPH 2002 Premoze et. al., Eurographics 2003 Application: Segmentation Interactive Deformation and Visualization of Level-Set Surfaces Using Graphics Hardware Lefohn, Kniss, Hansen, Whitaker Wednesday at 10:30am Fast Volume Segmentation and Simultaneous Visualization using Programmable Graphics Hardware Sherbondy, Houston, Napel Wednesday at 3:15pm Dynamic Volume Challenges Memory layout Full volume Sparse volume Load balancing CPU AGP bus GPU vertex processor GPU rasterizer GPU fragment processor

Sparse Volume: Computation Full volume Computation performed on all voxels each computation step Examples - Navier Stokes - Volume light transport Sparse volume Computation performed only on subset of voxels Examples - Surface embedding Memory Layout: Computation How to update volume? GPUs only output 2D Copy-to-texture from frame buffer Render-to-texture with pbuffer Render-to-texture with Uber-buffer Update one slice per render pass Multiple render targets (MRT) Write four RGBA outputs per pass ATI Radeon 9600/9700/9800 Memory Layout: Computation Full volume Stack of 2D slices Render-to-3D texture - Render each slice separately - Uber-buffers Full Volume: Computation Pbuffer layout Assume RGBA Changing pbuffers is slow, changing surfaces is fast Sparse volume Active research topic Two papers at Vis03 - Lefohn et al. and Sherbondy et al. Store full volume: Depth culling Store only sparse domain Layout option Pack data into RGBA Multiple surfaces per pbuffer Multiple slices per surface Saves memory No No Reduces context switches Minimize program complexity No No Multiple surfaces per pbuffer Multiple slices per surface

Full Volume: Computation Uber-buffers layout No context switches Obviates need for many memory tricks Render-to-3D texture See OpenGL Extensions -- Siggraph 2003 - Rob Mace and James Percy - http://www.ati.com/developer/techpapers.html Full Volume: Volume Rendering Pbuffers 2D texture-based volume rendering - Case A: Render slices - Case B: Reconstruct slice using lines from each slice - See Lefohn/Kniss et al. TVCG pre-print Uber-buffers 3D texture volume rendering Sparse Volume: Computation Store full volume Culling limits computational domain - Depth, mesh, stencil Use full-volume storage techniques See Sherbondy et al. talk on Wednesday at 3:15pm - Application of depth culling and uber-buffers Sparse Volume: Computation Problem 3D computation/visualization domain 2D memory representation Store sparse domain Pack active voxels into 2D texture Compute 3D computation on dynamic 2D representation Lefohn et al. on Wednesday at 10:30am - Multi-surface pbuffers

Sparse Volume: Virtual Memory Solution Multi-dimensional virtual memory abstraction - 3D virtual memory space - 2D physical memory space Sparse Volume: Rendering Store full volume / depth culling Same rendering techniques as full volume Store sparse domain Case A: Reconstruct slice with quadrilaterals Case B: Reconstruct slice with lines http://www.sci.utah.edu/~lefohn/work/rls/vislevelset/lefohn_tvcg03.pdf Sparse Volume: Rendering Example Sparse isosurface representation Reconstruction of a slice from packed data Memory Layout Summary Full volume Pbuffers - Pack scalar data into RGBA - Multi-surface pbuffers to avoid context switching Uber-buffers - Render-to-3D texture - No context switching Sparse volume Store full volume - Depth culling for sparse computation Store sparse volume - Multidimensional virtual memory

Load Balancing What are GPUs designed for? Load Balancing Distribute work to under-utilized resources Lighten fragment processor load Avoid slow pathways Consider computational frequency Texture Data CPU AGP Half-Life 2, Valve Software Vertex & Texture Coordinates Vertex Processor Rasterizer Fragment Processor Frame/Pixel Buffer(s) Load Balancing: General Concepts Basics Resources operate in parallel If single resource dominates, using the others is free Find uses for under-utilized resources Additional information - Graphics Pipeline Performance - http://developer.nvidia.com/object/gdc_2003_presentations.html Remember Why GPUs are Fast Raster graphics is embarrassingly parallel Independent processing of elements Pre-fetch and cache coherence Little or no branching Fragment programming is limited for a reason Think about the fixed-function OpenGL pipeline Remember why GPUs are fast

Load Balancing: Fragment Processor Fragment processor most powerful Often dominates computation and rendering time Highest computational frequency Use dependent texture reads sparingly - Balance lookup tables with computation Can values be computed at lower frequency? Avoid conditionals... Load Balancing: Fragment Conditionals All branches of conditionals are executed Polluted caches Useless instructions Useless memory reads Might change on future hardware Load Balancing: Substreams Resolve fragment conditionals into cases Draw each case as separate geometry batch Fragment program optimized for each case Example - Boundary conditions Load Balancing: Vertex Processor Vertex processor often idle Lift fragment ops when result can be linearly interpolated Pass result to fragment processor as pervertex-interpolant Adds load to rasterizer Lessens AGP and fragment load References - Lefohn et al., IEEE Visualization 2003 - Goodnight et al., Graphics Hardware 2003 - James and Harris, Game Developers Conference 2003 Example Neighbor memory address computations

Load Balancing: AGP Minimize CPU GPU traffic Use vertex buffers objects if can spare GPU memory Generate texture coordinates with vertex processor Minimize GPU CPU traffic Does CPU need all data from GPU? Use GPU to produce concise message - Reduction using mipmapping or down-sampling shader Use GPU computation to save AGP bandwidth Specific application: Bit-code image - TVCG pre-print http://www.sci.utah.edu/~lefohn/work/rls/vislevelset/lefohn_tvcg03.html Load Balancing: CPU CPU strengths Pre-computing GPU-invariant computation Updating complex data structures Complex logic Use for operations not suited-for or supported-by GPU Communication must be minimized Using CPU may accelerate solution faster than running entirely on GPU Load Balancing Summary Fragment stage is often bottleneck Statically resolve conditionals with substreams Compute texture addresses with vertex stage Reduces AGP traffic and fragment instructions Minimize GPU CPU communication Use GPU computation to produce minimal message Leverage CPU if necessary GPU-only solution not necessarily faster GPU Feature Requests Uber buffers Render-to-3D texture Fast changing of render targets (no context switch) Interchangeable depth/stencil/color buffers More interpolants Compute texture coordinates with vertex processor Global accumulation registers min, max, sum, norm, etc. Integer data type Bitwise operations for compression, message passing

GPU Programming Tools Requests GPU profiling tools for OpenGL Vertex and fragment program debuggers Fragment debugger: Tim Purcell and Pradeep Sen at Stanford Dynamic Volume Summary Marriage of computation and visualization Putting a face on computation Computational steering Important visualization application Memory layout Full and sparse volume formats Rendering considerations Load Balancing Maximize resource utilization Future Directions User interface design for interactive computation Batch processing Intuitive control Thank you Questions? Separate computation from memory layout N-D Multidimensional virtual memory manager