Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Similar documents

Interactive Level-Set Segmentation on the GPU

Software Packages The following data analysis software packages will be showcased:

Introduction to GPU Computing

ParaVision 6. Innovation with Integrity. The Next Generation of MR Acquisition and Processing for Preclinical and Material Research.

Interactive Level-Set Deformation On the GPU

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Introduction to GPU Programming Languages

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Advanced MRI methods in diagnostics of spinal cord pathology

Curriculum Vitae Anders Eklund

Computer Graphics Hardware An Overview

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Neuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease

GPU-based Decompression for Medical Imaging Applications

Using MATLAB to Measure the Diameter of an Object within an Image

DARPA, NSF-NGS/ITR,ACR,CPA,

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography

5MD00. Assignment Introduction. Luc Waeijen

GPU for Scientific Computing. -Ali Saleh

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

High Performance GPU-based Preprocessing for Time-of-Flight Imaging in Medical Applications

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

ST810 Advanced Computing

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GeoImaging Accelerator Pansharp Test Results

L20: GPU Architecture and Models

runl I IUI%I/\L Magnetic Resonance Imaging

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Part I Courses Syllabus

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

A Prototype For Eye-Gaze Corrected

Health Care Careers in the Field of Imaging. Shari Workman, MSM,PHR,CIR MultiCare Health System Senior Recruiter/Employment Specialist

Introduction to Computer Graphics

ultra fast SOM using CUDA

Real-time Visual Tracker by Stream Processing

32-Channel Head Coil Imaging at 3T

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

Introduction to GPGPU. Tiziano Diamanti

Turbomachinery CFD on many-core platforms experiences and strategies

NVIDIA GeForce GTX 580 GPU Datasheet

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Lecture 14. Point Spread Function (PSF)

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Advanced Rendering for Engineering & Styling

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Stream Processing on GPUs Using Distributed Multimedia Middleware

High-speed image processing algorithms using MMX hardware

Functional neuroimaging. Imaging brain function in real time (not just the structure of the brain).

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

HIGH PERFORMANCE FOURIER VOLUME RENDERING ON GRAPHICS PROCESSING UNITS (GPUS)

Digital Image Processing with DragonflyPACS

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

The Future Of Animation Is Games

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Case Study on Productivity and Performance of GPGPUs

Next Generation GPU Architecture Code-named Fermi

GE Medical Systems Training in Partnership. Module 8: IQ: Acquisition Time

Cirrus 0.2T. MRI for Everyone. North America, Asia, Europe. contact:

QCD as a Video Game?

Large-Data Software Defined Visualization on CPUs

Graphic Processing Units: a possible answer to High Performance Computing?

Image Area. View Point. Medical Imaging. Advanced Imaging Solutions for Diagnosis, Localization, Treatment Planning and Monitoring.

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Master of Engineering - ME (Medical Software)

5 Factors Affecting the Signal-to-Noise Ratio

Volume Visualization Tools for Geant4 Simulation

High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound

Evaluation of CUDA Fortran for the CFD code Strukti

multimodality image processing workstation Visualizing your SPECT-CT-PET-MRI images

NEURO M203 & BIOMED M263 WINTER 2014

Volume visualization I Elvins

A Three-Dimensional Correlation Method for Registration of Medical Images in Radiology

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

anatomage table Interactive anatomy study table

GE Global Research. The Future of Brain Health

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

MVTec Software GmbH.

Advances in scmos Camera Technology Benefit Bio Research

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

Thinking ahead. Focused on life. REALIZED: GROUNDBREAKING RESOLUTION OF 80 µm VOXEL

HPC Wales Skills Academy Course Catalogue 2015

The Scientific Data Mining Process

Introduction to Medical Imaging. Lecture 11: Cone-Beam CT Theory. Introduction. Available cone-beam reconstruction methods: Our discussion:

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

COSCO 2015 Heterogeneous Computing Programming

GPGPU Computing. Yong Cao

MODERN VOXEL BASED DATA AND GEOMETRY ANALYSIS SOFTWARE TOOLS FOR INDUSTRIAL CT

Transcription:

Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu

Outline Motivation why do we need GPUs? Past - how was GPU programming done 10 years ago? Present - how are GPUs used in medical imaging today? Future - what challenges do we face?

What is medical imaging? Creating images of the interior of the human body, for research and clinical purposes The three most common modalities are Computed tomography (CT) Magnetic resonance imaging (MRI) Ultrasound (US)

Why do we need GPUs in medical imaging? The medical data explosion Demanding algorithms for image reconstruction and data analysis Visualization & interactivity

The medical data explosion Medical image data have evolved from 2D to 4D Temporal and spatial resolutions continue to improve The number of subjects being scanned are increasing

The medical data explosion From 2D to 4D data

The medical data explosion From 2D to 4D data 512 x 512 image 512 x 512 x 512 volume 128 x 128 x 64 x 100 dataset 512 x 512 x 512 x 20 dataset 1 MB 512 MB 420 MB 10.7 GB

What about the computational complexity for 2D, 3D and 4D data?

The medical data explosion From 2D to 4D data Filtering is one of the most important operations in (medical) image processing Can be performed as convolution in the spatial domain or as multiplication in the frequency domain s = signal (image), f = filter

Filtering Edge detection Apply two filters to detect edges along x and y

The medical data explosion From 2D to 4D data Convolving a 512 x 512 image with a 11 x 11 filter requires ~32 million multiply add operations

The medical data explosion From 2D to 4D data Convolving a 512 x 512 x 512 volume with a 11 x 11 x 11 filter requires ~179 billion multiply add operations

The medical data explosion From 2D to 4D data Convolving a 512 x 512 x 512 x 20 dataset with a 11 x 11 x 11 x 11 filter requires ~39 trillion multiply add operations

The medical data explosion From 2D to 4D data Data size from 1 MB to 10.7 GB, increase of a factor ~10 000 Computational complexity from 32 million operations to 39 trillion operations, increase of a factor ~1 million

The medical data explosion Higher temporal and spatial resolution The temporal and spatial resolution of all medical imaging modalities continue to improve Better hardware, compare with digital cameras More complex sampling patterns

Magnetic resonance imaging (MRI) No ionizing radiation Can measure different properties (fmri, DTI, SWI) Good for soft tissue Can generate 2D, 3D, 4D data Expensive Significantly slower compared to CT

Computed tomography (CT) Extremely quick High spatial resolution Good for hard tissue Can generate 2D, 3D, 4D data Expensive Ionizing radiation

Ultrasound Cheap Mobile Very high temporal resolution (20-30 Hz) Can generate 2D, 3D, 4D data Lower spatial resolution Noisy images

How to get a higher spatial resolution MRI: Stronger magnetic fields or longer scan times (expensive and difficult) CT: More radiation (not so good for the subject)

The medical data explosion Higher temporal and spatial resolution More complex sampling techniques to further improve spatial and temporal resolution Compressed sensing, sample data in a smarter way Parallel imaging, sample more data at the same time More complex image reconstruction algorithms

The medical data explosion More subjects 1980, 5 million CT scans in the US 2007, 65 million CT scans in the US Brenner DJ. Should we be concerned about the rapid increase in CT usage? Reviews on Environmental Health, 25, 63 68, 2010

The medical data explosion More subjects Functional magnetic resonance imaging (fmri) can be used to study brain activity A small fmri study involves some 20 subjects fmri data collection is expensive The human connectome project will share fmri and DTI data from 1200 subjects http://www.humanconnectome.org/

Demanding algorithms Image reconstruction, to convert the collected data to an image or volume Image registration, to align two images or volumes Image segmentation, to extract a specific part of an image or volume Image denoising, to suppress noise and improve the image quality

Demanding algorithms The human connectome project will collect and share fmri data from 1200 subjects 12 GB of data per subject Apply a permutation test with 10,000 permutations to each dataset (statistical analysis) Equivalent to analyze 144,000,000 GB of data

Visualization & Interactivity Hard to look at 3D/4D data as 2D images 512 x 512 x 512 x 20 dataset = 10 240 images ~3 hours if you look at every image for 1 second Use volume rendering techniques instead Interactive algorithms, combined with visualization

Past

Why GPUs? GPUs are very popular for image processing Computer graphics; render all the pixels in the same way Image processing; apply the same operation to all pixels GPUs have hardware support for (linear) interpolation

Eklund et al., Medical image processing on the GPU Past, present and future, Medical Image Analysis, 2013 28

Eklund et al., Medical image processing on the GPU Past, present and future, Medical Image Analysis, 2013 29

How was GPU programming done 10 years ago? Do image processing through computer graphics languages OpenGL, Open Graphics Language Direct X HLSL, High Level Shading Language Cg, C for graphics GLSL, OpenGL Shading Language

How was GPU programming done 10 years ago? Only a few experts knew how to use these programming languages for image processing Hard to optimize the performance Very hard to debug the code

Present

How is GPU programming done today? C programming of GPUs CUDA, Compute Unified Device Architecture OpenCL, Open Computing Language Possible to debug GPU code as regular C code Possible to improve performance by using tools like the Nvidia visual profiler

How are GPUs used in medical imaging today? Image reconstruction Image registration Image segmentation Image denoising

Image reconstruction - MRI MRI data is sampled in the frequency domain Most common reconstruction; apply an inverse fast Fourier transform (FFT) CUFFT (CUDA), clfft (OpenCL) More advanced sampling patterns result in more complex image reconstruction algorithms

Non-cartesian sampling Non-cartesian sampling is sometimes better The FFT requires cartesian sampling Cartesian sampling Spiral sampling

fmri Functional magnetic resonance imaging (fmri) Collect volumes of the brain while the subject is performing a task Used to study brain activity Standard fmri dataset: 64 x 64 x 30 x 400 (voxels are 4.0 x 4.0 x 4.0 mm, sampling rate 0.5 Hz) High resolution dataset: 128 x 128 x 60 x 1200 (voxels are 2.0 x 2.0 x 2.0 mm, sampling rate 1.5 Hz)

fmri = pattern matching in time High correlation with paradigm (brain activity) t 200 time points Low correlation with paradigm (no brain activity)

High-resolution fmri for mapping fingers 1 mm isotropic functional MRI at 3 T. Bilateral finger tapping blocked study, Red is index finger and Blue is pinky or fifth finger. Contrast map formed by subtracting (Red: index pinky) and (Blue: pinky-index) Challenge: fmri has MANY TIME POINTS AND SLICES. In this data we had 16 slices and 200 time points = 3200 images to be reconstructed. On CPU this is not feasible with total reconstruction time reaching 1 month. Used IMPATIENT reconstruction on GPU. Total reconstruction time of 40 hours instead of 1 month. http://impact.crhc.illinois.edu/mri.aspx University of Illinois at Urbana-Champaign Brad Sutton mrfil.bioen.illinois.edu 39

DTI Diffusion tensor imaging (DTI) Measure diffusion of water in different directions Combine the measurements to a diffusion tensor (a 3 x 3 matrix in each voxel) Often used to study brain connectivity

color-coded FA map High-resolution Diffusion Tensor Imaging of neural pathways Diffusion weighted image with field distortion (No correction). Diffusion weighted image with field-corrected reconstruction. 1 mm isotropic DTI MRI at 3T. DTI allows for a non-invasive characterization of neural integrity. Our technique corrects for field inhomogeneity and performs SENSE reconstructions on high resolution data. Challenge: Multiple slabs and multiple directions in a single data set Diffusion Tensor Imaging (DTI) requires many reconstructions for one data set. Used IMPATIENT reconstruction on GPU. Reduced reconstruction time from 18 hours to 5 minutes. http://impact.crhc.illinois.edu/mri.aspx University of Illinois at Urbana-Champaign Brad Sutton mrfil.bioen.illinois.edu 41

Fiber tracking for DTI DTI can be used for tracking of fibers in the brain Place a seed somewhere in the brain Follow the main orientation of the diffusion tensor in each voxel, gives the path of each fiber The main orientation is given by the eigenvector corresponding to the largest eigenvalue

Mittmann et al., Performing Real-Time Interactive Fiber Tracking, Journal of Digital Imaging, 24, 339-351, 2011

Image registration Image registration is needed whenever you want to align two images or volumes Compare a subject before and after surgery Combine different medical imaging modalities Make a group analysis of fmri data (transform all subjects to a brain template)

Image registration - Example

Image registration - Algorithm 1. Calculate similarity measure between images 2. Calculate a new set of transformation parameters (using some optimization algorithm) 3. Apply transformation using interpolation 4. Go to 1

Image registration Non-linear Linear image registration, optimize a few parameters like rotations and translations Non-linear image registration, use 100,000 1,000,000 parameters (three parameters per voxel) Non-linear registration often gives a better result, at the cost of a longer processing time

Image registration - Algorithm 1. Calculate similarity measure between images GPU 2. Calculate a new set of transformation parameters Linear registration: CPU Non-linear registration: GPU 3. Apply transformation using interpolation GPU 4. Go to 1

Image registration Non-linear registration of 200 MRI volumes to a brain template (182 x 218 x 182) FSL AFNI AFNI OpenMP BROCCOLI Intel CPU BROCCOLI Nvidia GPU BROCCOLI AMD GPU 116 hours 110 hours 31 hours 3.5 hours 15 minutes 20 minutes Eklund et al., BROCCOLI: Software for fast fmri analysis on many-core CPUs and GPUs, Frontiers in Neuroinformatics, 8:24, 2014

Image segmentation Image segmentation is needed whenever you want to study a specific part of a dataset What is the size of hippocampus? How big is the brain tumour? Has it grown since a previous timepoint?

Image segmentation - Example

Image segmentation Most image segmentation algorithms can run in parallel Easy to visualize data in GPU memory Easy to create interactive interfaces for image segmentation

Roberts et al., A work-efficient GPU algorithm for level-set segmentation, High performance graphics, 123-132, 2010

Image denoising Image denoising is used to suppress noise and to improve the image quality Makes it easier for a medical doctor to do a diagnosis Often used before image registration or image segmentation, to improve the result

Adaptive filtering of a 4D CT dataset A CT dataset of the size 512 x 512 x 512 x 20 Adaptive filtering; apply 11 non-separable filters of the size 11 x 11 x 11 x 11 CPU: GPU: 2.5 days 26 minutes Eklund et al., True 4D image denoising on the GPU, International journal of biomedical imaging, 2011

A beating heart

Future

What challenges do we face? Large datasets, how can we analyze them on a GPU? Code optimization, do not want to re-optimize code for each GPU generation Easier programming & usage, such that GPUs can solve more problems

Large datasets CPUs normally have access to more memory Easier to increase the amount of memory A GPU has to process a small part of the dataset at a time, gives smaller speedup The amount of GPU memory needs to increase faster than the temporal and spatial resolution of medical imaging data

Code optimization Frustrating to have to re-optimize code for each GPU generation Use libraries as often as possible, wait for libraries to be re-optimized Future compilers will hopefully be better at optimizing code

Easier programming & usage Many persons do not have time to learn GPU programming Accelerate existing C/C++ code using the PGI accelerator model, the HMPP workbench compiler or C++ AMP Important that more programming interfaces are created

Easier programming & usage Many Adobe products have GPU support (e.g. for image processing and video editing) The parallel computing toolbox for Matlab supports GPU computing Important that more softwares provide GPU support

Thank you for your attention Questions?