HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS)

Size: px
Start display at page:

Download "HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS)"

Transcription

1 HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

2 HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING Under the supervision of Assoc. Prof. Ayman El-Dieb Assoc. Prof. Amr Shaarawi Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

3 HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS) By Marwan Mohamed Ahmed Abdellah Systems & Biomedical Engineering Department Faculty of Engineering, Cairo University A thesis submitted to the Faculty of Engineering, Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in SYSTEMS & BIOMEDICAL ENGINEERING Approved by the Examining Committee Prof. Dr. Yasser Kadah, Member Prof. Dr. Mohamed El-Adawy, Member Assoc. Prof. Dr. Ayman El-Dieb, Main Advisor Assoc. Prof. Dr. Amr Shaarawi, Thesis Advisor FACULTY OF ENGINEERING CAIRO UNIVERSITY GIZA, EGYPT 2012

4 Engineer : Marwan Mohamed Ahmed Abdellah Date of Birth : 5 / 7 / 1987 Nationality : Egyptian [email protected] Phone : Address : No. 4, Muhammad Kasem St., Maadi, Cairo, Egypt. Registration Date : 1 / 10 / 2009 Awarding Date : / / Degree : Master s of Science (M.Sc.) Department : Systems & Biomedical Engineering Department Supervisors : Assoc. Prof. Dr. Ayman M. Eldieb Assoc. Prof. Dr. Amr A. Shaarawi Examiners : Prof. Dr. Yasser Mostafa Kadah Prof. Dr. Mohamed Ibrahim. Eladawy Assoc. Prof. Dr. Ayman M. Eldieb Assoc. Prof. Dr. Amr A. Shaarawi (Faculty of Engineering Helwan University) Title of Thesis : High Performance Fourier Volume Rendering on Graphics Processing Units (GPUs) Key Words : Fourier Volume Rendering, Medical Image Reconstruction, Projection-slice Theory, GPU Computing, CUDA. Summary : The past years have seen tremendous advances in volume visualization techniques that have been used broadly in medical imaging. In particular, volume rendering techniques have received a considerable attention in this area. However, spatial domain volume rendering has achieved a wide acceptance from scientists and physicians, but this category of rendering techniques was associated with several constrains due to their O(N 3 ) time-complexity, which limited their usability in several aspects. Fourier Volume Rendering (FVR) is an alternative technique that operates on the frequency spectrum of the volume with lower time complexity of order O(N 2 logn) relying on the projection-slice theory. This technique allows the generation of attenuation- only renderings or projections of volumetric data that look like x-ray radio- graphs. It has been used extensively in digital radiography. In this work, a high performance pure GPU-accelerated implementation for the Fourier volume rendering pipeline is proposed to achieve 30X of speed up over a naive implementation by mapping the entire pipeline to be executed on the GPU.

5 DECLARATION I, Marwan Abd Ellah, hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, Ihavefullycitedandreferencedallmaterialandresultsthatarenotoriginal to this work. Marwan Abd Ellah Date i

6 ACKNOWLEDGEMENTS This work would not have been possible without the invaluable support, advice and encouragement of my dear supervisors. I am honored to present my special thanks and deepest gratitude to Dr. Ayman El Deib & Dr. Amr Sharawi for their guidance and insightful feedback during the duration of this project. Also, I would like to thank my professor Dr. Yasser Kadah for his outstanding Medical Image Reconstruction course, which has formed the fundamentals of image reconstruction in general and the Fourier volume rendering in particular, and also have fueled me to investigate deeper to end up with this work. As well as, I would like to thank Dr. Stefano Cozzini for accepting me to attend his advanced school in High Performance Computing that was held in the Abdulsalam International Center of Theoretical Physics (ICTP) in Italy. It was really a nice, valuable, and unforgettable experience. ii

7 ABSTRACT The past several years have seen tremendous advances in volume visualization techniques that have been used broadly in medical imaging. In particular, volume rendering has received a considerable attention in this area. However, spatial domain volume rendering has achieved a wide acceptance from scientists and physicians, but this category of rendering techniques was associated with diverse constrains due to their O(N 3 ) time-complexity, which limited their usability in several aspects. Fourier Volume Rendering (FVR) is an alternative technique that operates on the frequency spectrum of the volume with lower time complexity of order O(N 2 logn) relying on the projectionslice theory. This technique allows the generation of attenuation-only renderings or projections of volumetric data that look like x-ray radiographs. It has been used extensively in digital radiography. In this work, a high performance pure GPU-accelerated implementation for the Fourier volume rendering pipeline is proposed to achieve 30X of speed up over a hybrid implementation by mapping the entire pipeline to be executed on the GPU. Keywords: Fourier Volume Rendering, Medical Image Reconstruction, Projection- Slice Theory, GPU Computing, CUDA.

8 PREFACE In this work, an in-depth investigation has been carried out to achieve a high performance implementation of the Fourier volume rendering pipeline on the GPU. It considered in particular CUDA-enabled GPUs to be used as a high performance computing architectures that can leverage the performance of data-parallel algorithm, which completely suits our problem. In advance, in Chapter 1, Introduction, volume visualization techniques that have been used widely in the medical arena are presented. It concentrated mainly on volume rendering as a scientific tool to explore the internal structures of volumetric objects. Then, it focused on Frequency domain volume rendering as an alternative technique to spatial domain algorithms at which it reduces the rendering time-complexity to order of O(N 2 logn). Afterwards, we summarize the previous work in this area and our contribution. Chapter 2, Theory Behind Frequency Domain Volume Rendering, aims at providing a gentle introduction to the theories relevant to frequency domain volume rendering. Sampling theory, Fourier transform, Hartley transform, and projection-slice theory are briefly discussed to set the stages to chapters to come by. Basically, High Performance Computing as we understand deals with the implementations of some algorithm and the hardware it run on, but as a iv

9 research tool, it demands at least a basic understanding of several disciplines, concepts, and methodologies that range from algorithms, computer programming, software and hardware architectures. In Chapter 3, High Performance Computing on Graphics Processing Units, weexplainhowtheevolutionof GPUs has turned them to be high performance platforms relying on their massively parallel architecture. A special treatment for the CUDA architecture is considered. Although we tried to keep this chapter comprehensive and concise, but the temptation to cover everything is overwhelming and the reader is assumed to have some familiarity with programming and high-level computer architecture. In Chapter 4, Algorithm & Implementation, the Fourier volume rendering algorithm is presented and demystified to the reader. This chapter is intended as an attempt to summarize the Fourier volume rendering pipeline. It started with a general description on a level independent of specific architecture and then it moves towards a certain strategy that will be adopted to leverage the performance of the GPU-accelerated implementation. It is the author s persuasion that a good understanding of the implementation aspects of this algorithm will reflect the significance of the achieved results. In Chapter 5, Results, we discuss reconstruction and performance benchmarking results of both the naive implementation and our proposed one that is executed entirely on the GPU. In Chapter 6, Conclusion & Future Work, we wrap up and conclude what have been presented in this sequel followed by some future work that might be undertaken either by us or by future researchers working in the same area. v

10 ACRONYMS 1D One-Dimensional 2D Two-Dimensional 3D Three-Dimensional ALU Arithmetic Logic Unit APIs Application Programming Interfaces BO Buffer Object Cg C for Graphics CPU Central Processing Unit CT Computed Tomography CUDA Computer Unified Device Architecture CUFFT CUDA FFT DFT Discrete Fourier Transform vi

11 DHT Discrete Hartley Transform DRR Digital Reconstructed Radiograph ECC Error-Correcting Code FBO Frame Buffer Object FFT Fast Fourier Transform FFTW FFT in the West FHT Fast Hartley Transform FVR Fourier Volume Rendering GLSL OpenGL Shading Language GPGPU General Purpose Graphics Processing Unit GPU Graphics Processing Unit HPC High Performance Computing MRI Magnetic Resonance Imaging OpenCL Open Computing Library OpenGL Open Graphics Library PBO Pixel Buffer Object SIMT Single Instruction Multiple Thread SM Stream Multiprocessor SP Streaming Processor TP Thread Processor VBO Vertex Buffer Object vii

12 CONTENTS 1 INTRODUCTION Medical Visualization Volume Rendering Frequency Domain Volume Rendering Previous Work Contribution & Thesis Objectives THEORY BEHIND FREQUENCY DOMAIN VOLUME REN- DERING Notation Special Functions Delta Dirac Shah Function Sinc Function Rect Function Sampling Theory Nyquist Shannon Sampling Theorem Aliasing Windowing viii

13 2.4 Fourier Transform Transform Pair Properties of Fourier Transform Multi-Dimensional Fourier Transform D Fourier Transform D Fourier Transform Separability Theorem Convolution Theorem Discrete Fourier Transform Fast Fourier Transform Hartley Transform Definition Discrete Hartley Transform Pros & Cons Projection-Slice Theory Definition Proof HIGH PERFORMANCE COMPUTING ON GRAPHICS PROCESSING UNITS (GPUS) High Performance Computing The Era of GPU Computing GPGPU & GPU Computing GPU Architecture Evolution CPU & GPU In Comparison Heterogeneous Computing Model Compute Unified Device Architecture Understanding CUDA Architecture CUDA Programming Model Threading Hierarchy Memory Model Global Memory Shared Memory Register Memory Local Memory Constant Memory Texture Memory ix

14 3.5.5 Execution Model CUDA Software Programming Environment CUDA Computing Architecture Limitations of CUDA GPU Contexts FFT on GPU ALGORITHM & IMPLEMENTATION Objective & Flow Algorithm Implementation Strategy The Naive Hybrid Approach Analyzing the Naive Algorithm Naive Algorithm Bottlenecks Suppressing Multidimensional Arrays Algorithm Mapping to the GPU CUDA Kernels FVR Pipeline on GPU Mapping Analysis RESULTS Volume Reconstruction Results Benchmarking Results Eliminating Multi-Dimensional Arrays Mapping Computational Context to GPU CONCLUSION & FUTURE WORK Conclusion Future Work BIBLIOGRAPHY 124 x

15 LIST OF FIGURES 1.1 Computer-generated Rendering for a Skull Dataset, reference: Wikipedia Surface Rendering of a Head Dataset, reference: Wikipedia The Process of Volume Rendering a Tooth, reference : GPU Gems High Definition Volume Rendering for a Skull with Volume Ray-Casting, reference: Wikipedia Mouse Skull (CT) Rendering using the Shear Warp Algorithm, reference: Wikipedia Example of Rendering CT Data (Visible Male Dataset) using the Fourier Volume Rendering Algorithm A Projection of the Foot Dataset Reconstructed using the Fourier Volume Rendering Algorithm Continuous Dirac Delta Function δ(t) Kronecker Delta Function δ[n] The Shah Function X(t) The Sinc Function sinc(t) The Rect or Box Function Π(t) xi

16 2.6 Sampling Process - Time Domain is on Left, and Frequency Domain is on Right Aliasing Hamming Window & its Frequency Response Projecting 3D Volume to a 2D X-ray like Image Graphical illustration of the projection-slice theory in twodimensions. f(x, y) and F (k x,k y )are two-dimensional Fourier transform pairs, p(x) is the projection of f(x, y) on the x axis, and s(k x ) is the projection slice of p(x) in the frequency domain High Performance Computing Interdisciplinarity Memory Bandwidth Improvements for CPU & GPU [72] Single & Double Precision Floating-Point Operations Per Second for CPU & GPU [72] The GeForce 7800 architecture with 3 kinds of Programmable Engines (Courtesy of NVIDIA) The G80 GPU with Unified Shader Architecture CPU & GPU Computing Architectures in Comparison, GPU Devotes More Transistors to Data Processing Heterogenous Computing Model with CPU & GPU Problem Decomposition for Serial Parts to be executed on CPU & Parallel Parts to be executed on the GPU Block Diagram for CUDA Stream Multiprocessor (SM) Three-Dimensional Blocks of Two-Dimensional Grids CUDA Memory Model Executing a Kernel Grid on two different GPUs Executing Two Different Kernel Grids on the GPU, (Courtesy of NVIDIA) Thread Index Calculations with 1D Grid & 1D Blocks NVIDIA Compilation Process CUDA Framework Architecture GT200 GPU Architecture Fermi Architecture Block Diagram Fermi SM Architecture CUDA Interoperability with OpenGL Block Diagram for the FVR Algorithm xii

17 4.2 FVR Pipeline FVR Pipeline is Divided into Preprocessing Stage & Rendering Loop. The Rendering Loop is Executed 3 Times to Generate 3 Different Projections for the Same Volume Naive Hybrid Implementation for the FVR Pipeline D Wrapping-Around with 3D Arrays for Real Data D Wrapping-Around with 3D Arrays for Complex Data Repacking the Complex Spectrum from FFTW Array into 1D Array Compatible with OpenGL 3D Texture A Block Diagram Illustrating the Execution of the OpenGL Off-Screen Context D Wrapping-Around Involving 2D Arrays Rendering the Projection Image Eliminating 2D Arrays from the FVR Pipeline Eliminating 3D Arrays from the FVR Pipeline FVR Pipeline on GPU Linking OpenGL Off-Screen Rendering Context with CUDA Context Linking OpenGL Off-Screen Rendering Context with CUDA Context Linking OpenGL CUDA Context with OpenGL On-Screen Context Sagittal View for Visible Male Dataset (256 x 256 x 256) The Central Part of the Visible Male Dataset (128 x 128 x 128) Axial View for Visible Male Dataset (256 x 256 x 256) Sagittal View for the Skull Dataset (256 x 256 x 256) Coronal View for the Skull Dataset (256 x 256 x 256) Foot Dataset (256 x 256 x 256) Engine Dataset (256 x 256 x 256) Bonsai Tree Dataset (256 x 256 x 256) Teapot Dataset (128 x 128 x 128) Hydrogen Atom Dataset (128 x 128 x 128) Nieg Dataset (64 x 64 x 64) Tri-Linear Interpolation Scheme Nearest-Neighbor Interpolation Orthogonal Projection for the Visible Male Dataset xiii

18 5.15 Oblique Projections for the Visible Male Dataset without & with high order reconstruction filter in A & B respectively xiv

19 LIST OF TABLES 3.1 CUDA Memory Types supported by its Memory Model for NVIDIA Quadro FX A Table Summarizing the Features of the Three Main CUDA GPU Architectures Benchmarking for the 3D Wrapping-Around Operation on CPU with 3D & 1D Arrays Benchmarking for the 3D Wrapping-Around Operation on CPU with 3D & 1D Arrays including the Time Consumed During the Replacement of Arrays D Wrapping-Around of Real Data on CPU & GPU D Wrapping-Around Operation of Real Data on CPU & GPU D Wrapping-Around of Complex Data on CPU & GPU D FFT with FFTW & CUFFT Libraries D FFT with FFTW & CUFFT Libraries Comparing Performance for a volume of xv

20 This page is left intentionally blank

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Introduction to GPGPU. Tiziano Diamanti [email protected]

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it [email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

GPU Computing - CUDA

GPU Computing - CUDA GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Texture Cache Approximation on GPUs

Texture Cache Approximation on GPUs Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, [email protected] 1 Our Contribution GPU Core Cache Cache

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Graphic Processing Units: a possible answer to High Performance Computing?

Graphic Processing Units: a possible answer to High Performance Computing? 4th ABINIT Developer Workshop RESIDENCE L ESCANDILLE AUTRANS HPC & Graphic Processing Units: a possible answer to High Performance Computing? Luigi Genovese ESRF - Grenoble 26 March 2009 http://inac.cea.fr/l_sim/

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Programming GPUs with CUDA

Programming GPUs with CUDA Programming GPUs with CUDA Max Grossman Department of Computer Science Rice University [email protected] COMP 422 Lecture 23 12 April 2016 Why GPUs? Two major trends GPU performance is pulling away from

More information

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Joe Davis, Sandeep Patel, and Michela Taufer University of Delaware Outline Introduction Introduction to GPU programming Why MD

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

How To Teach Computer Graphics

How To Teach Computer Graphics Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

3D Computer Games History and Technology

3D Computer Games History and Technology 3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed

More information

DIGITAL IMAGE PROCESSING AND ANALYSIS

DIGITAL IMAGE PROCESSING AND ANALYSIS DIGITAL IMAGE PROCESSING AND ANALYSIS Human and Computer Vision Applications with CVIPtools SECOND EDITION SCOTT E UMBAUGH Uffi\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

GPU Usage. Requirements

GPU Usage. Requirements GPU Usage Use the GPU Usage tool in the Performance and Diagnostics Hub to better understand the high-level hardware utilization of your Direct3D app. You can use it to determine whether the performance

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

Data Visualization. Principles and Practice. Second Edition. Alexandru Telea

Data Visualization. Principles and Practice. Second Edition. Alexandru Telea Data Visualization Principles and Practice Second Edition Alexandru Telea First edition published in 2007 by A K Peters, Ltd. Cover image: The cover shows the combination of scientific visualization and

More information

The Fastest, Most Efficient HPC Architecture Ever Built

The Fastest, Most Efficient HPC Architecture Ever Built Whitepaper NVIDIA s Next Generation TM CUDA Compute Architecture: TM Kepler GK110 The Fastest, Most Efficient HPC Architecture Ever Built V1.0 Table of Contents Kepler GK110 The Next Generation GPU Computing

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT COMP27112 Computer Graphics and Image Processing 2: Introducing image synthesis [email protected] 1 Introduction In these notes we ll cover: Some orientation how did we get here? Graphics system

More information

Le langage OCaml et la programmation des GPU

Le langage OCaml et la programmation des GPU Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

Guided Performance Analysis with the NVIDIA Visual Profiler

Guided Performance Analysis with the NVIDIA Visual Profiler Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided

More information

Speeding Up RSA Encryption Using GPU Parallelization

Speeding Up RSA Encryption Using GPU Parallelization 2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation Speeding Up RSA Encryption Using GPU Parallelization Chu-Hsing Lin, Jung-Chun Liu, and Cheng-Chieh Li Department of

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

A Pattern-Based Approach to. Automated Application Performance Analysis

A Pattern-Based Approach to. Automated Application Performance Analysis A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,

More information

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Performance Optimization and Debug Tools for mobile games with PlayCanvas Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with

More information

Master of Science Graphics, Multimedia and Virtual Reality. Courses description

Master of Science Graphics, Multimedia and Virtual Reality. Courses description Master of Science Graphics, Multimedia and Virtual Reality Courses description Advanced graphics programming techniques The course presents methods for complex 3D scenes rendering using GPU advanced programming

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa COMPUTER SCIENCE Computer Science is the study of computer programs, abstract models of computers, and applications of computing.

More information

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel

More information

CS 325 Computer Graphics

CS 325 Computer Graphics CS 325 Computer Graphics 01 / 25 / 2016 Instructor: Michael Eckmann Today s Topics Review the syllabus Review course policies Color CIE system chromaticity diagram color gamut, complementary colors, dominant

More information

Developer Tools. Tim Purcell NVIDIA

Developer Tools. Tim Purcell NVIDIA Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

Parallel Computing for Data Science

Parallel Computing for Data Science Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint

More information

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department

More information

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]

More information

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Interactive Level-Set Segmentation on the GPU

Interactive Level-Set Segmentation on the GPU Interactive Level-Set Segmentation on the GPU Problem Statement Goal Interactive system for deformable surface manipulation Level-sets Challenges Deformation is slow Deformation is hard to control Solution

More information

OpenGL Performance Tuning

OpenGL Performance Tuning OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Hardware Announcement ZG09-0170, dated March 31, 2009 NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Table of contents 1 At a glance 3

More information

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect SIGGRAPH 2013 Shaping the Future of Visual Computing NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect NVIDIA

More information

Volume Rendering on Mobile Devices. Mika Pesonen

Volume Rendering on Mobile Devices. Mika Pesonen Volume Rendering on Mobile Devices Mika Pesonen University of Tampere School of Information Sciences Computer Science M.Sc. Thesis Supervisor: Martti Juhola June 2015 i University of Tampere School of

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set

More information