Hardware design for ray tracing



Similar documents
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

Realtime Ray Tracing and its use for Interactive Global Illumination

SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

A Cross-Platform Framework for Interactive Ray Tracing

The Future Of Animation Is Games

Computer Graphics Global Illumination (2): Monte-Carlo Ray Tracing and Photon Mapping. Lecture 15 Taku Komura

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

L20: GPU Architecture and Models

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

GPU Point List Generation through Histogram Pyramids

Computer Graphics Hardware An Overview

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

INTRODUCTION TO RENDERING TECHNIQUES

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Writing Applications for the GPU Using the RapidMind Development Platform

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Texture Cache Approximation on GPUs

Data Centric Systems (DCS)

Radeon HD 2900 and Geometry Generation. Michael Doggett

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

Introduction to GPU Programming Languages

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Lecture Notes, CEng 477

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

HPC with Multicore and GPUs

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

GPU Architecture. Michael Doggett ATI

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

In the early 1990s, ubiquitous

Advanced Rendering for Engineering & Styling

Router Architectures

Reduced Precision Hardware for Ray Tracing. Sean Keely University of Texas, Austin

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

CHAPTER 1 INTRODUCTION

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

A unified representation for interactive 3D modeling

Introduction to Computer Graphics

Choosing a Computer for Running SLX, P3D, and P5

Introduction to GPGPU. Tiziano Diamanti

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Binary search tree with SIMD bandwidth optimization using SSE

QCD as a Video Game?

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

GPGPU Computing. Yong Cao

Multi-GPU Load Balancing for Simulation and Rendering

Next Generation Operating Systems

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

How To Teach Computer Graphics

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Employing Complex GPU Data Structures for the Interactive Visualization of Adaptive Mesh Refinement Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Dynamic Resolution Rendering

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Brook for GPUs: Stream Computing on Graphics Hardware

Petascale Visualization: Approaches and Initial Results

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Computer Applications in Textile Engineering. Computer Applications in Textile Engineering

Interactive Level-Set Deformation On the GPU

Introduction to GPU hardware and to CUDA

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

HistoPyramid stream compaction and expansion

Parallel Firewalls on General-Purpose Graphics Processing Units

Introduction to Game Programming. Steven Osman

OpenEXR Image Viewing Software

Interactive Visualization of Magnetic Fields

MAQAO Performance Analysis and Optimization Tool

GPU Parallel Computing Architecture and CUDA Programming Model

Parallel Simplification of Large Meshes on PC Clusters

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

A VOXELIZATION BASED MESH GENERATION ALGORITHM FOR NUMERICAL MODELS USED IN FOUNDRY ENGINEERING

The Scientific Data Mining Process

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Amazing renderings of 3D data... in minutes.

Dhiren Bhatia Carnegie Mellon University

Transcription:

Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex scenes, and advanced rendering effects still require more performance. Ray tracing is highly parallel algorithm because the contribution of each ray to the final image can be computed independently from the other rays. To exploit this massive parallelism, multi-core, multi-threaded hardware support is essential and efficient. For increased performance, ray tracing require spatial index structures for quickly finding the respective subset of rays. One drawback of spatial indices is that dynamic changes in the scene require re-computation of the index. But, in recent hardware architectures[woop 2005],[Schmittler 2004],[Purcell 2002], only static scenes or software-supported dynamic scenes are used. Therefore, my goal is to propose an overall ray tracing hardware architecture including update of acceleration structure. Related Work [Schmittler 2004], [Purcell 2002], [Woop 2005] architectures are basically based on the property of ray-tracing s parallelism. For high performance, they use multi-core and multi-threading approaches. SaarCOR[Schmittler 2004] used dedicated hardware for ray-tracing. And RPU[Woop 2005] upgrades this to have more programmable hardware. These two works used the dynamic scene management scheme in [Wald 2003]. This scheme is based on the fact that large parts of a scene often remain static over long periods of time. So, they separate the scene into independent three classes of objects. One is the static objects. Second is the object undergoing affine transformation. Third is the object with unstructured motion. Static objects and affine transforming objects can totally removes the reconstruction cost for hierarchically animated objects. So, this method can reduce the cost of structure reconstruction. But, in the SaarCOR and RPU architecture provides

no special support for building spatial index structures. This task has to be performed by the host CPU. Figure 1 shows the SaarCOR hardware architecture. The core ray tracing algorithm is contained in the dynamic ray tracing pipeline (DynRTP). RGS generates an initial ray R and transformation T. T is applied to the R in the transformation unit, and transformed ray is sent to the traversal unit. Then the traversal unit starts traversing the ray through the KD-tree until a leaf node is found. The ray with the list of objects is then forwarded to the mailbox unit. Mailbox unit can ommit objects that have already been isited by the same ray. Then the ray is sent to the transformation unit which maps the ray into object space. The ray is sent to the traversal unit to start the bottom-level traversal. From this method, recursive traversal can be executed. Finally the results are handed back to the RGS for shading. The RPU is the next version of the SaarCOR. The major difference of this architecture is that it used programmable units for some operations. For example, the intersection test can be performed in shader as shown in figure 2. Ray transformation and intersection tests in ray tracing can be performed nicely by shader instructions. Figure 3 shows the RPU architecture. SPU is the Shader Processing Unit, the TPU is the Traversal Processing Units, and the MPU is Mailboxed List Processing Unit. SPU is based on current GPUs, and used for above example, global lighting or vertex shading. TPU is dedicated hardware because traversal of a ray through a K-D tree typically requires 50 to 100 steps with scalar floating point operations. Using a fully

programmable vector unit for these operations wastes precious cycles. The TPU share a dedicated MPU. After traversal the SPU can perform intersection test. During the transformation, 4-D vector operation is needed. So, Each SPU operates on 4-component vectors as its basic data type. Also, to exploit the data parallelism, SPU and TPU is designed to be multi-threaded. Multi-threading allows to increase hardware utilization by filling instruction slots that would otherwise not be used due to instruction dependencies or memory latency.

[Purcell 2002] is based on the GPU hardware, but it is not well known current GPU but the stream processor. Stream Processor is more programmable and more general purpose hardware than GPU. It is proposed in [Khailnay et al. 2000]. Stream processor has two new concepts which are the kernel and stream. Basically, stream is the input or output data of operation units, and kernel is program to process streams. Stream ray tracer can be split into four kernels: eye ray generation, traversal, ray-triangle intersection, and shading as shown in the figure 4. The eye ray generator kernel produces a stream of viewing rays. The traversal kernel reads the stream of rays produced by the eye ray generator. The traversal kernel steps rays through the grid until the ray encounters a voxel containing triangles. The ray and voxel address are output and passed to the intersection kernel. The intersection kernel is responsible for testing ray intersections with all the triangles contained in the voxel. The intersector has two types of output. If ray-triangle intersection (hit) occurs in that voxel, the ray and the triangle that is hit is output for shading. If no hit occurs, the ray is passed back to the traversal kernel and the search for voxels containing triangles continues. The shading kernel computes a color. If a ray terminates at this hit, then the color is written to the accumulated image. Additionally, the shading kernel may generate shadow or secondary rays; in this case, these new rays are passed back to the traversal stage. But, the [Purcell 2002] did not consider the cost of building data structure, so it is not be efficient for dynamic scenes.

Overview There are two main issues in real-time ray tracing. One is the recursive traversal of acceleration structures. And the other is re-computation of acceleration structures for dynamic scene. The hardware-driven recursive traversal method is proposed in the recent works[woop 2005],[Schmittler 2004],[Purcell 2002]. But the hardware support for building spatial index structures is not proposed yet. In the conventional graphics pipeline, ray-tracing is located after vertex shading. Without hardware support for structure reconstruction, CPU should receive the vertex data after vertex shading, and make data structure, and then give this to ray-tracing hardware. It is very inefficient Therefore, my goal is to propose an overall ray tracing hardware architecture including update of acceleration structure.