HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS)

Similar documents

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Introduction to GPU Programming Languages

Introduction to GPGPU. Tiziano Diamanti

Computer Graphics Hardware An Overview

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Next Generation GPU Architecture Code-named Fermi

L20: GPU Architecture and Models

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

GPGPU Computing. Yong Cao

GPU Computing - CUDA

Introduction to GPU hardware and to CUDA

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

HPC with Multicore and GPUs

Introduction to GPU Computing

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Texture Cache Approximation on GPUs

GPU Parallel Computing Architecture and CUDA Programming Model

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Graphic Processing Units: a possible answer to High Performance Computing?

GPGPU accelerated Computational Fluid Dynamics

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Part I Courses Syllabus

Real-time Visual Tracker by Stream Processing

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Programming GPUs with CUDA

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Interactive Level-Set Deformation On the GPU

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

CUDA programming on NVIDIA GPUs

How To Teach Computer Graphics

ST810 Advanced Computing

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

GPU Architecture. Michael Doggett ATI

3D Computer Games History and Technology

DIGITAL IMAGE PROCESSING AND ANALYSIS

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Stream Processing on GPUs Using Distributed Multimedia Middleware

GPU Usage. Requirements

Writing Applications for the GPU Using the RapidMind Development Platform

Evaluation of CUDA Fortran for the CFD code Strukti

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Data Visualization. Principles and Practice. Second Edition. Alexandru Telea

The Fastest, Most Efficient HPC Architecture Ever Built

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Hardware design for ray tracing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Introduction to GPU Architecture

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Le langage OCaml et la programmation des GPU

ultra fast SOM using CUDA

GPU for Scientific Computing. -Ali Saleh

Guided Performance Analysis with the NVIDIA Visual Profiler

Speeding Up RSA Encryption Using GPU Parallelization

Parallel Programming Survey

A Pattern-Based Approach to. Automated Application Performance Analysis

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

OpenCL Programming for the CUDA Architecture. Version 2.3

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Master of Science Graphics, Multimedia and Virtual Reality. Courses description

Radeon HD 2900 and Geometry Generation. Michael Doggett

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

CS 325 Computer Graphics

Developer Tools. Tim Purcell NVIDIA

Parallel Computing for Data Science

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

Parallel Firewalls on General-Purpose Graphics Processing Units

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Interactive Level-Set Segmentation on the GPU

OpenGL Performance Tuning

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Volume Rendering on Mobile Devices. Mika Pesonen

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Transcription:

HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING Under the supervision of Assoc. Prof. Ayman El-Dieb Assoc. Prof. Amr Shaarawi Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING Approved by the Examining Committee Prof. Dr. Yasser Kadah, Member Prof. Dr. Mohamed El-Adawy, Member Assoc. Prof. Dr. Ayman El-Dieb, Main Advisor Assoc. Prof. Dr. Amr Shaarawi, Thesis Advisor FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

Engineer : Marwan Mohamed Ahmed Abdellah Date of Birth : 5 / 7 / 1987 Nationality : Egyptian E-mail : abdellah.marwan@gmail.com Phone : +2 0100 27 51 829 Address : No. 4, Muhammad Kasem St., Maadi, Cairo, Egypt. Registration Date : 1 / 10 / 2009 Awarding Date : / / Degree : Master s of Science (M.Sc.) Department : Systems & Biomedical Engineering Department Supervisors : Assoc. Prof. Dr. Ayman M. Eldieb Assoc. Prof. Dr. Amr A. Shaarawi Examiners : Prof. Dr. Yasser Mostafa Kadah Prof. Dr. Mohamed Ibrahim. Eladawy Assoc. Prof. Dr. Ayman M. Eldieb Assoc. Prof. Dr. Amr A. Shaarawi (Faculty of Engineering Helwan University) Title of Thesis : High Performance Fourier Volume Rendering on Graphics Processing Units (GPUs) Key Words : Fourier Volume Rendering, Medical Image Reconstruction, Projection-slice Theory, GPU Computing, CUDA. Summary : The past years have seen tremendous advances in volume visualization techniques that have been used broadly in medical imaging. In particular, volume rendering techniques have received a considerable attention in this area. However, spatial domain volume rendering has achieved a wide acceptance from scientists and physicians, but this category of rendering techniques was associated with several constrains due to their O(N 3 ) time-complexity, which limited their usability in several aspects. Fourier Volume Rendering (FVR) is an alternative technique that operates on the frequency spectrum of the volume with lower time complexity of order O(N 2 logn) relying on the projection-slice theory. This technique allows the generation of attenuation- only renderings or projections of volumetric data that look like x-ray radiographs. It has been used extensively in digital radiography. In this work, a high performance pure GPU-accelerated implementation for the Fourier volume rendering pipeline is proposed to achieve 30X of speed up over a naive implementation by mapping the entire pipeline to be executed on the GPU.

DECLARATION I, Marwan Abd Ellah, hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, Ihavefullycitedandreferencedallmaterialandresultsthatarenotoriginal to this work. Marwan Abd Ellah Date i

ACKNOWLEDGEMENTS This work would not have been possible without the invaluable support, advice and encouragement of my dear supervisors. I am honored to present my special thanks and deepest gratitude to Dr. Ayman El Deib & Dr. Amr Sharawi for their guidance and insightful feedback during the duration of this project. Also, I would like to thank my professor Dr. Yasser Kadah for his outstanding Medical Image Reconstruction course, which has formed the fundamentals of image reconstruction in general and the Fourier volume rendering in particular, and also have fueled me to investigate deeper to end up with this work. As well as, I would like to thank Dr. Stefano Cozzini for accepting me to attend his advanced school in High Performance Computing that was held in the Abdulsalam International Center of Theoretical Physics (ICTP) in Italy. It was really a nice, valuable, and unforgettable experience. ii

ABSTRACT The past several years have seen tremendous advances in volume visualization techniques that have been used broadly in medical imaging. In particular, volume rendering has received a considerable attention in this area. However, spatial domain volume rendering has achieved a wide acceptance from scientists and physicians, but this category of rendering techniques was associated with diverse constrains due to their O(N 3 ) time-complexity, which limited their usability in several aspects. Fourier Volume Rendering (FVR) is an alternative technique that operates on the frequency spectrum of the volume with lower time complexity of order O(N 2 logn) relying on the projectionslice theory. This technique allows the generation of attenuation-only renderings or projections of volumetric data that look like x-ray radiographs. It has been used extensively in digital radiography. In this work, a high performance pure GPU-accelerated implementation for the Fourier volume rendering pipeline is proposed to achieve 30X of speed up over a hybrid implementation by mapping the entire pipeline to be executed on the GPU. Keywords: Fourier Volume Rendering, Medical Image Reconstruction, Projection- Slice Theory, GPU Computing, CUDA.

PREFACE In this work, an in-depth investigation has been carried out to achieve a high performance implementation of the Fourier volume rendering pipeline on the GPU. It considered in particular CUDA-enabled GPUs to be used as a high performance computing architectures that can leverage the performance of data-parallel algorithm, which completely suits our problem. In advance, in Chapter 1, Introduction, volume visualization techniques that have been used widely in the medical arena are presented. It concentrated mainly on volume rendering as a scientific tool to explore the internal structures of volumetric objects. Then, it focused on Frequency domain volume rendering as an alternative technique to spatial domain algorithms at which it reduces the rendering time-complexity to order of O(N 2 logn). Afterwards, we summarize the previous work in this area and our contribution. Chapter 2, Theory Behind Frequency Domain Volume Rendering, aims at providing a gentle introduction to the theories relevant to frequency domain volume rendering. Sampling theory, Fourier transform, Hartley transform, and projection-slice theory are briefly discussed to set the stages to chapters to come by. Basically, High Performance Computing as we understand deals with the implementations of some algorithm and the hardware it run on, but as a iv

research tool, it demands at least a basic understanding of several disciplines, concepts, and methodologies that range from algorithms, computer programming, software and hardware architectures. In Chapter 3, High Performance Computing on Graphics Processing Units, weexplainhowtheevolutionof GPUs has turned them to be high performance platforms relying on their massively parallel architecture. A special treatment for the CUDA architecture is considered. Although we tried to keep this chapter comprehensive and concise, but the temptation to cover everything is overwhelming and the reader is assumed to have some familiarity with programming and high-level computer architecture. In Chapter 4, Algorithm & Implementation, the Fourier volume rendering algorithm is presented and demystified to the reader. This chapter is intended as an attempt to summarize the Fourier volume rendering pipeline. It started with a general description on a level independent of specific architecture and then it moves towards a certain strategy that will be adopted to leverage the performance of the GPU-accelerated implementation. It is the author s persuasion that a good understanding of the implementation aspects of this algorithm will reflect the significance of the achieved results. In Chapter 5, Results, we discuss reconstruction and performance benchmarking results of both the naive implementation and our proposed one that is executed entirely on the GPU. In Chapter 6, Conclusion & Future Work, we wrap up and conclude what have been presented in this sequel followed by some future work that might be undertaken either by us or by future researchers working in the same area. v

ACRONYMS 1D One-Dimensional 2D Two-Dimensional 3D Three-Dimensional ALU Arithmetic Logic Unit APIs Application Programming Interfaces BO Buffer Object Cg C for Graphics CPU Central Processing Unit CT Computed Tomography CUDA Computer Unified Device Architecture CUFFT CUDA FFT DFT Discrete Fourier Transform vi

DHT Discrete Hartley Transform DRR Digital Reconstructed Radiograph ECC Error-Correcting Code FBO Frame Buffer Object FFT Fast Fourier Transform FFTW FFT in the West FHT Fast Hartley Transform FVR Fourier Volume Rendering GLSL OpenGL Shading Language GPGPU General Purpose Graphics Processing Unit GPU Graphics Processing Unit HPC High Performance Computing MRI Magnetic Resonance Imaging OpenCL Open Computing Library OpenGL Open Graphics Library PBO Pixel Buffer Object SIMT Single Instruction Multiple Thread SM Stream Multiprocessor SP Streaming Processor TP Thread Processor VBO Vertex Buffer Object vii

CONTENTS 1 INTRODUCTION 2 1.1 Medical Visualization....................... 3 1.2 Volume Rendering........................ 5 1.3 Frequency Domain Volume Rendering.............. 8 1.4 Previous Work.......................... 9 1.5 Contribution & Thesis Objectives................ 11 2 THEORY BEHIND FREQUENCY DOMAIN VOLUME REN- DERING 13 2.1 Notation.............................. 14 2.2 Special Functions......................... 14 2.2.1 Delta Dirac........................ 14 2.2.2 Shah Function....................... 16 2.2.3 Sinc Function....................... 16 2.2.4 Rect Function....................... 17 2.3 Sampling Theory......................... 17 2.3.1 Nyquist Shannon Sampling Theorem......... 18 2.3.2 Aliasing.......................... 20 2.3.3 Windowing........................ 21 viii

2.4 Fourier Transform......................... 23 2.4.1 Transform Pair...................... 23 2.4.2 Properties of Fourier Transform............. 24 2.4.3 Multi-Dimensional Fourier Transform.......... 25 2.4.3.1 2D Fourier Transform............. 25 2.4.3.2 3D Fourier Transform............. 26 2.4.4 Separability Theorem................... 26 2.4.5 Convolution Theorem................... 27 2.4.6 Discrete Fourier Transform................ 27 2.4.7 Fast Fourier Transform.................. 28 2.5 Hartley Transform........................ 29 2.5.1 Definition......................... 29 2.5.2 Discrete Hartley Transform............... 30 2.5.3 Pros & Cons........................ 30 2.6 Projection-Slice Theory...................... 31 2.6.1 Definition......................... 31 2.6.2 Proof............................ 34 3 HIGH PERFORMANCE COMPUTING ON GRAPHICS PROCESSING UNITS (GPUS) 37 3.1 High Performance Computing.................. 38 3.2 The Era of GPU Computing................... 40 3.2.1 GPGPU & GPU Computing............... 40 3.2.2 GPU Architecture Evolution............... 43 3.3 CPU & GPU In Comparison................... 44 3.4 Heterogeneous Computing Model................ 47 3.5 Compute Unified Device Architecture.............. 49 3.5.1 Understanding CUDA Architecture........... 49 3.5.2 CUDA Programming Model............... 50 3.5.3 Threading Hierarchy................... 51 3.5.4 Memory Model...................... 52 3.5.4.1 Global Memory................. 53 3.5.4.2 Shared Memory................. 54 3.5.4.3 Register Memory................ 54 3.5.4.4 Local Memory................. 54 3.5.4.5 Constant Memory............... 54 3.5.4.6 Texture Memory................ 54 ix

3.5.5 Execution Model..................... 56 3.5.6 CUDA Software Programming Environment...... 59 3.5.7 CUDA Computing Architecture............. 60 3.5.8 Limitations of CUDA................... 65 3.6 GPU Contexts........................... 66 3.7 FFT on GPU........................... 67 4 ALGORITHM & IMPLEMENTATION 71 4.1 Objective & Flow......................... 72 4.2 Algorithm............................. 73 4.3 Implementation Strategy..................... 76 4.4 The Naive Hybrid Approach................... 78 4.4.1 Analyzing the Naive Algorithm............. 81 4.4.2 Naive Algorithm Bottlenecks............... 87 4.4.3 Suppressing Multidimensional Arrays.......... 89 4.5 Algorithm Mapping to the GPU................. 90 4.5.1 CUDA Kernels...................... 92 4.5.2 FVR Pipeline on GPU.................. 92 4.5.3 Mapping Analysis..................... 95 5 RESULTS 101 5.1 Volume Reconstruction Results................. 103 5.2 Benchmarking Results...................... 109 5.2.1 Eliminating Multi-Dimensional Arrays......... 109 5.2.2 Mapping Computational Context to GPU....... 110 6 CONCLUSION & FUTURE WORK 114 6.1 Conclusion............................. 115 6.2 Future Work............................ 116 BIBLIOGRAPHY 124 x

LIST OF FIGURES 1.1 Computer-generated Rendering for a Skull Dataset, reference: Wikipedia... 3 1.2 Surface Rendering of a Head Dataset, reference: Wikipedia.. 4 1.3 The Process of Volume Rendering a Tooth, reference : GPU Gems... 5 1.4 High Definition Volume Rendering for a Skull with Volume Ray-Casting, reference: Wikipedia... 6 1.5 Mouse Skull (CT) Rendering using the Shear Warp Algorithm, reference: Wikipedia... 7 1.6 Example of Rendering CT Data (Visible Male Dataset) using the Fourier Volume Rendering Algorithm............ 8 1.7 A Projection of the Foot Dataset Reconstructed using the Fourier Volume Rendering Algorithm.............. 8 2.1 Continuous Dirac Delta Function δ(t)... 15 2.2 Kronecker Delta Function δ[n]... 15 2.3 The Shah Function X(t)... 16 2.4 The Sinc Function sinc(t)... 16 2.5 The Rect or Box Function Π(t)... 17 xi

2.6 Sampling Process - Time Domain is on Left, and Frequency Domain is on Right........................ 19 2.7 Aliasing.............................. 22 2.8 Hamming Window & its Frequency Response......... 23 2.9 Projecting 3D Volume to a 2D X-ray like Image........ 32 2.10 Graphical illustration of the projection-slice theory in twodimensions. f(x, y) and F (k x,k y )are two-dimensional Fourier transform pairs, p(x) is the projection of f(x, y) on the x axis, and s(k x ) is the projection slice of p(x) in the frequency domain. 35 3.1 High Performance Computing Interdisciplinarity........ 39 3.2 Memory Bandwidth Improvements for CPU & GPU [72]... 41 3.3 Single & Double Precision Floating-Point Operations Per Second for CPU & GPU [72]..................... 41 3.4 The GeForce 7800 architecture with 3 kinds of Programmable Engines (Courtesy of NVIDIA)................. 45 3.5 The G80 GPU with Unified Shader Architecture........ 46 3.6 CPU & GPU Computing Architectures in Comparison, GPU Devotes More Transistors to Data Processing.......... 47 3.7 Heterogenous Computing Model with CPU & GPU...... 48 3.8 Problem Decomposition for Serial Parts to be executed on CPU & Parallel Parts to be executed on the GPU....... 48 3.9 Block Diagram for CUDA Stream Multiprocessor (SM).... 50 3.10 Three-Dimensional Blocks of Two-Dimensional Grids..... 52 3.11 CUDA Memory Model...................... 53 3.12 Executing a Kernel Grid on two different GPUs........ 57 3.13 Executing Two Different Kernel Grids on the GPU, (Courtesy of NVIDIA)............................ 58 3.14 Thread Index Calculations with 1D Grid & 1D Blocks..... 59 3.15 NVIDIA Compilation Process.................. 60 3.16 CUDA Framework Architecture................. 61 3.17 GT200 GPU Architecture.................... 62 3.18 Fermi Architecture Block Diagram............... 63 3.19 Fermi SM Architecture...................... 64 3.20 CUDA Interoperability with OpenGL............. 67 4.1 Block Diagram for the FVR Algorithm............. 74 xii

4.2 FVR Pipeline........................... 75 4.3 FVR Pipeline is Divided into Preprocessing Stage & Rendering Loop. The Rendering Loop is Executed 3 Times to Generate 3 Different Projections for the Same Volume....... 76 4.4 Naive Hybrid Implementation for the FVR Pipeline...... 79 4.5 3D Wrapping-Around with 3D Arrays for Real Data...... 83 4.6 3D Wrapping-Around with 3D Arrays for Complex Data... 84 4.7 Repacking the Complex Spectrum from FFTW Array into 1D Array Compatible with OpenGL 3D Texture.......... 85 4.8 A Block Diagram Illustrating the Execution of the OpenGL Off-Screen Context........................ 86 4.9 2D Wrapping-Around Involving 2D Arrays........... 88 4.10 Rendering the Projection Image................. 88 4.11 Eliminating 2D Arrays from the FVR Pipeline......... 90 4.12 Eliminating 3D Arrays from the FVR Pipeline......... 91 4.13 FVR Pipeline on GPU...................... 94 4.14 Linking OpenGL Off-Screen Rendering Context with CUDA Context.............................. 97 4.15 Linking OpenGL Off-Screen Rendering Context with CUDA Context.............................. 98 4.16 Linking OpenGL CUDA Context with OpenGL On-Screen Context.............................. 99 5.1 Sagittal View for Visible Male Dataset (256 x 256 x 256)... 103 5.2 The Central Part of the Visible Male Dataset (128 x 128 x 128)103 5.3 Axial View for Visible Male Dataset (256 x 256 x 256).... 104 5.4 Sagittal View for the Skull Dataset (256 x 256 x 256)..... 104 5.5 Coronal View for the Skull Dataset (256 x 256 x 256)..... 104 5.6 Foot Dataset (256 x 256 x 256)................. 105 5.7 Engine Dataset (256 x 256 x 256)................ 105 5.8 Bonsai Tree Dataset (256 x 256 x 256)............. 105 5.9 Teapot Dataset (128 x 128 x 128)................ 106 5.10 Hydrogen Atom Dataset (128 x 128 x 128)........... 106 5.11 Nieg Dataset (64 x 64 x 64)................... 106 5.12 Tri-Linear Interpolation Scheme................. 107 5.13 Nearest-Neighbor Interpolation................. 107 5.14 Orthogonal Projection for the Visible Male Dataset...... 108 xiii

5.15 Oblique Projections for the Visible Male Dataset without & with high order reconstruction filter in A & B respectively.. 108 xiv

LIST OF TABLES 3.1 CUDA Memory Types supported by its Memory Model for NVIDIA Quadro FX 4800.................... 55 3.2 A Table Summarizing the Features of the Three Main CUDA GPU Architectures........................ 65 5.1 Benchmarking for the 3D Wrapping-Around Operation on CPU with 3D & 1D Arrays....................... 109 5.2 Benchmarking for the 3D Wrapping-Around Operation on CPU with 3D & 1D Arrays including the Time Consumed During the Replacement of Arrays.................... 110 5.3 2D Wrapping-Around of Real Data on CPU & GPU...... 110 5.4 3D Wrapping-Around Operation of Real Data on CPU & GPU 111 5.5 3D Wrapping-Around of Complex Data on CPU & GPU... 111 5.6 2D FFT with FFTW & CUFFT Libraries........... 111 5.7 3D FFT with FFTW & CUFFT Libraries........... 111 5.8 Comparing Performance for a volume of 256.......... 112 xv

This page is left intentionally blank