1 The Power of Medical Imaging on GPUs Dr. Anne C. Elster Assoc. Prof., HPC-Lab, Dept. Computer and Info. Science Norwegian University of Science & Technology Trondheim, Norway and Visiting Scientist, ECE, University of Texas at Austin, USA
2 Thank yous to: My Collaborators, Post Docs and graduate students! Drs. Frank Lindseth (SINTEF Med Tech) & Prof. Bjørn Angelsen, NTNU Med School, Dept. of Circulation & Medical Imaging
3 Thank yous to: 07/08:@ SC 07 My Post Docs and 06/07:Spring 2007 graduate students! 09/10:Spring 2010 http://research.idi.ntnu.no/hpc-lab 08/09:Spring 2009 10/11: @ SC 10 11/12:Spring 2012 A.C. Elster: The Power of Medical Imaging on GPU
4 NTNU Gløshaugen (formerly Norwegian Institute of Technology) U of Texas at Austin
5 Trondheim, Norway on the world map http://research.idi.ntnu.no/hpc-lab 5 A.C. Elster: The Power of Medical Imaging on GPU
6 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary
7 MOTIVATION The Power of Medical Imaging : Use Ultrasound, MRI, PET etc Imaging for: medical diagnostics (avoid exploratory surgery) image-guided surgery ++
8 The Power of Medical Imaging : Use Ultrasound, MPI, PetScans Imaging to: diagnose (avoid exploratory surgery) image-guided surgery ++ By harnessing the compute-power of GPUs!
9 Motivation GPU Computing: ModMany advances in processor designs are driven by Billion $$ gaming market! ern GPUs (Graphic Processing Unit) offer lots of FLOPS per watt!.. and lots of parallelism! NVIDA Tesla 2050/2070 (Fermi): 448 CUDA cores! - Kepler: - GTX 690 and Tesla K10 cards - have 3072 (2x1536) cores!
10 Heterogenous supercomputing China s Tianhe-1A No. 1 Supercomputer (SC 10)- NUDT/NSCC/Tianjin NUDT 6-core Intel X5670 2.93 GHz + NVIDIA Tesla M2050 GPU Custom interconnect, 183,368 Cores Rmax @2.57 Pflop/s China s Nebulae -- No.2 (ISC 10)/ No. 3 (SC 10) At National Supercomputing Centre in Shenzhen, China - Dawning TC3600 Blades w/intel X5650 2.67GHz + (4640) Nvidia Tesla C2050 GPUs - Theoretical peak performance at 2.98 PFlop/s - Linpack performance of 1.271 PFlop/s
11 NTNU GPU Activities Elster s HPC-lab has graduated 25+ Master students (diplom) in GPU computing (2007-2012) Currently supervising 8+PhD students & 9 master studs. NTNU designated NVIDIA CUDA Teaching Center (summer 2011) PhD seminar course (Spring 2013: 7 students) Master s level course (Fall 2012: 14 students) Senior Parallel Computing class Fall 2010: 43 taking exam Fall 2012: 57 students NVIDIA CUDA Research Center (2012)
12 HPC-Lab History (last 8 yrs): Fall 2006: First 2 student projects with GPU programming (Cg) Christian Larsen (MS Fall Project, December 2006): Utilizing GPUs on Cluster Computers (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare Elster as head of Computational Science & Visualization program and helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) 12
13 HPC-Lab History (contin.): 2007: Erik Axel Nielsen (Masters thesis, June 2007): Real-time Wavelet Filtering on the GPU -- joint project with GE Healthcare. 40 times GPU speedup of algorithm led to our implementation being adopted the same fall in their high-end cardivascular ultrasound scanner. Christian Larsen (Masters thesis, June 2007) Tore Fevang, Schlumberger (co-advisor): "Framework for Polygonial Structures Computations on Clusters (incl GPU parallelization) Idar Borlaug (Masters thesis, June 2007): Seismic Processing Using Parallel 3D FMM Thibault Collet (Masters thesis summer 2007): "Massively Online Games with Food Chains" Knut Imar Hagen (Masters thesis, June 2007) Fault-tolerance for MPI Codes on Computation Clusters (joint project with Statoil) Nils Magnus Larsgård (Masters thesis summer 2007): Framework for Converting MPI Codes to Hybrid OpenMP/MPI Codes 13 http://research.idi.ntnu.no/hpc-lab A.C. Elster: The Power of Medical Imaging on GPU
14 HPC-Lab History (contin.): 2008: Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) 14
15 HPC-Lab History (contin.): 2008: HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) Rune Hovland (Masters project, Dec 2008) : "Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster) Daniele Giuseppe Spampinato (Masters Project, December 2008): "Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster) Atle Rudshaug (Masters thesis, June 2008): Optimizing & Parallelizing a Large Commercial Code for Modeling Oil-well Networks -- joint project with Yggdrasil Andreas Bach (Masters thesis, September 2008): Profiling and Optimizing a Seismic Application on Modern Architectures -- joint project with Statoil 15
16 HPC-Lab History (contin.): 2009: NVIDIA Tesla s1070 (4 GPUs 960 cores * 1.44GHz, 4TF) Two NVIDIA Quadro FX 5800 cards (Jan 09), NVIDIA Ion (Jun 09) Two AMD/ATI Radon 5870 (1600 cores @ 850MHz, 2.72TF) (one donated by AMD) Note: Memory vs. Proc clocks E.g. NVIDIA s1070(-500): 792MHz vs 1.44GHz 16
17 HPC-Lab History (contin.): 2008: HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) Atle Rudshaug (Masters thesis, June 2008): Optimizing & Parallelizing a Large Commercial Code for Modeling Oil-well Networks -- joint project with Yggdrasil Andreas Bach (Masters thesis, September 2008): Profiling and Optimizing a Seismic Application on Modern Architectures -- joint project with Statoil Rune Hovland (Masters project, Dec 2008) : "Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster) Daniele Giuseppe Spampinato (Masters Project, December 2008): "Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster) 17
18 Selected Master theses and Master reports supervised by Dr. Elster in 2009 1) Robin Eidissen (Masters thesis, January 2009) : "Utilizing GPUs for Real-Time Visualization of Snow (demoed @ SC 08-SC 10) Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) : "GPU Techniques for Porous Rock Visualization 2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) : "Techniques and Tools for Optimizing Codes on Modern Architectures: A Low-Level Approach (NR MS Thesis Award!) 3) Rune Johan Hovland (Masters thesis, June 2009), Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs http://research.idi.ntnu.no/hpc-lab A.C. Elster: The Power of Medical Imaging on GPU
19 Selected Master theses and Master reports supervised by Dr. Elster in 2009 1) Robin Eidissen (Masters thesis, January 2009) : "Utilizing GPUs for Real-Time Visualization of Snow (demoed @ SC 08-SC 10) Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) : "GPU Techniques for Porous Rock Visualization 2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) : "Techniques and Tools for Optimizing Codes on Modern Architectures: A Low-Level Approach (NR MS Thesis Award!) 3) Rune Johan Hovland (Masters thesis, June 2009), Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs 4) Henrik Hesland (Masters thesis, June 2009) Thorvald Natvig (co-advisor): "GPU-Enabled Interactive Pore Detection for 3D Rock Visualization " 5) Eirik Ola Aksnes (Masters thesis, July 2009) Ståle Fjeldstand & Atle Rudshaug, Numerical Rocks (co-advisors): "Simulation of Fluid Flow Through Porous Rocks on Modern GPUs" (ParCo 2009) 6) Daniel Haugen (Masters thesis, July 2009) Tore Fevang, Schlumberger (co-advisor): "Seismic Data Compression and GPU Memory Latency" 7) Åsmund Herikstad (Masters thesis, July 2009) Svein-Erik Måsøy, MedTek, NTNU (co-advisor) "Parallel Techniques for Estimation and Correction of Aberration in Medical Ultrasound Imaging" 8) Owe Johansen (Masters thesis, July 2009) John Hybertsen & Jon André Haugen, Statoil (coadvisors): "Seismic Shot Processing on GPU" 9) Daniele Giuseppe Spampinato (Masters thesis, July 2009; currently PhD student @ ETH) "Modeling Communication on Multi-GPU Systems (ParCo 2009) http://research.idi.ntnu.no/hpc-lab A.C. Elster: The Power of Medical Imaging on GPU
20 HPC-Lab History (contin.): 2010: - NVIDIA Fermi-based card(470, c2050, c2070(fall)) - More on OpenCL Ahmed A. Aqwari (Masters thesis, June 2010): Effects of Compression on Data Intensive Algorithms Aleksander Gjermundsen (Masters thesis, July 2010): Audio Processing on GPU Andreas Hysing (Masters thesis, Aug 2010): Parallel Inversion code (w/statoil) Øystein Krog (Masters thesis, June 2010): GPU-based Real-Time Snow Avalanche Simulations Holger Ludvigsen (Masters thesis, June 2010, Dr. Frank Lindseth (co-advisor): Real-Time GPU-Based 3D Ultrasound Reconstruction and Visualization Thorvald Natvig (PhD Dec 2010) Automatic Run-Time Communication and I/O 20
21 HPC-Lab Masters Theses Spring 2011 Fredrik Fossum, MTech 2011 Real-Time Rigid Body Interactions (on GPU) Yngve S. Lindal, MTech 2011 MSProj @ CERN w/ Sverre Jarp (CTO, CERN), co-advisor: Optimizing a High-Energy Physics (HEP) Toolkit on Heterogeneous Architectures. Bent Ove Stinessen, MTech 2011 Dr. Alf Birger Rustad (Statoil Research, co-advisor): Profiling, Optimization and Parallelization of Seismic Inversion Code. Jarle Steinsland, MTech2011 Auto-tunable GPU BLAS Thor Kristian Valderhaug, MTech 2011 The Lattice Boltzmann Simulation on Multi-GPU Systems. Erik Smistad, Integrated MTech/PhD Main advisor: Frank Lindseth (Elster is co-advisor) Medical imaging on GPUs Hallgeir Lien (Master fall proj) co-supervized with Dr. Jo Skjermo, Vegvesenet -- Road Generation Using A* Algorithm continued this fall by another student
22 Master students that finished Summer 2012: Kjetil Babington, MTech 2012: Terrain Rendering Techniques for the HPC-Lab Snow Simulator Thomas Falch, MTech: (Elster main advisor, Dag Breiby,(Physics) co-advisor 3D Visualization of X-ray Diffraction Data Geir Josten Lien, Mscience: Auto-tunable GPU BLAS Supervised Ca. 50 master students of which ca 25 on GPU topics Jan Rovde Realt-Time Granular Flow Simulations Using the PCISHP Method on GPGPU Devices using CUDA Frederik MJ Vestre Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL on Mobile Platforms Johannes Kvam, MTech Cybernetics, (Elster is co-advisor, Main advisor: Prof Angelsen) Mediacal Image Processing w/ GPUs
23 Anne C. Elster Lab Director Rune E. Jensen PhD Students: Erik Smistad (Elster co-advisor Linseth, main advisor)) Johannes Kvam, (Elster is co-advisor, Main advisor: Prof Angelsen) Thomas Falch Mehdi Bozorgi (Elster co-advisor Linseth, main advisor)) Ivar Ursin Nikolaisen (Co-advisor. Alf B. Rustad, Statoil) Lane Holloway (Univ. of TX Austin, USA), Elster (de facto co-advisor/ committee mbr, Don Fussel, UT Comp- Sci. UT (main advisor) Samira Pakdel + NN & NN Post Doc? Master Students: Recent PhDs: Lars Melhus Henrik Knutsen Lars Espen Nordhus Stian Pedersen Magnus Mikalsen Andreas Skomedal Andreas Nordahl + Lars Martin Petersen & Elisabeth Solheim Jan Christian Meyer (PhD 2012) Selected Affiliates /Visitors Drs. Frank Lindseth (SINTEF Med Tech) & Prof. Bjørn Angelsen, NTNU Med School, Ruben Spaans Grant Strong Dept. of Circulation & http://research.idi.ntnu.no/hpc-lab Medical Imaging A.C. Elster: ThePhD Power of Medical on GPU applicant PhDImaging stud. Canda Miguel Amor Thorvald Natvig GTC S3061, March 2013 PhD stud. Spain (PhD.20, 2010)
24 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary
25 3D Ultrasound Reconstruction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, MS students: Holger Ludvigsen (CUDA) and Thor K. Valderhaug (OpenCL on AMD)
26 Ultrasound 3D Reconstruction Challenges: Calculate 64 million voxels from ca 400 b-scans Used during surgery, so real-time reconstruction is very important Keep costs down
27 Ultrasound 3D Reconstruction Solution: GPU acceleration! VNN Algorithm: Fill plane points Transform plane points Fill plane equation For each Voxel: Find closest plane Project into plane Find 2D coord of projection on plane Fill Voxel Achieved reconstruction 1.29 sec time vs 29.61 sec on CPU!
28 Real-time Ultrasound 3D reconstruction multiple views
29 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary
30 3D Surface Extraction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and PhD student Erik Smistad
31 3D Surface Extraction on GPUs Use Marching Cubes algorithm for extracting a 3D surface from a set of sampled scalars Algorithm used extensively for visualizing and analyzing medical data (X-ray, MR) and the result of 3D segmentation. Completely data parallel Challenge: How to store the result of each cube in parallel on GPU
32 3D Surface Extraction -- Histogram data Challenge: How to store the result of each cube in parallel on GPU? In serial implementation this is simple just use a stack and add the vertex data to the stack GPU Solution: Histogram Pyramids [1] A datastructure that: Filters out cubes that has no triangle (stream reduction) Returns total sum of triangles Provides each cube with an index for memory storage Can be efficiently used by means of textures yielding large speed-ups [1] G. Ziegler et al: On-the-fly Point Clouds through Histogram Pyramids; Vision, Modeling, and Visualization 2006
33 3D Surface Extraction -- Histogram Pyramids: Construction & Traversal HP Construction HP Traversal
34 3D Surface Extraction -- Results: HPMC Dyken et al. vs. Our OpenCL implementation Size Exec. time FPS (avg) Memory 512^3 3324 ms 0.3 490 MB 256^3 5 ms 223 122 MB 128^3 3 ms 394 44 MB 64^3 2 ms 519 22MB Size Exec. time FPS (avg) Memory 512^3 34 ms 0.3 121 MB 256^3 10 ms 105 40 MB 128^3 4 ms 233 26 MB 64^3 3 ms 319 22MB Our Test system: Intel i5 750, 4GB RAM ATI Radeon 5870 (1GB RAM) AMD Catalyst 11.2 graphics driver APP SDK 2.3 w/ OpenCL 1.1 Note: OpenCL-OpenGL Synch measured to be 2-20ms, i.e. 70-90<% for smallest datasets
35 3D Surface Extraction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and PhD student Erik Smistad
36 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Related & Current projects at HPC-Lab at NTNU Summary
37 GPU-Based Airway Tree Segmentation and Centerline Extraction Erik Smistad PhD Student http://commons.wikimedia.org/w/index.php? title=file%3aright_bronchial_tree.ogg
38 GPU-Based Airway Tree Segmentation and Centerline Extraction Erik Smistad PhD Student
39 GPU Accelerated Segmentation and Centerline Extraction of Tubular Structures from Medical Images Dataset GPU Runtime CPU Runtime Patient 1 46 secs 12min 52 secs Patient 2 49 secs 14 min 43 secs Patient 3 49 secs 10 min 44 secs Patient 4 45 secs 14 mins 4 secs Patient 5 33 secs 10 mins 5 secs Patient 6 60 secs 17 mins 25 secs NVIDIA Tesla C2070 GPU vs. one Intel i7 720 CPU with 4 cores.
40 GPU Accelerated Segmentation and Centerline Extraction of Tubular Structures from Medical Images
41 Surf tech and drug delivery Bjørn Angelsen, Johannes Kvam ++ Advanced ultrasound signal processing techniques Complex calculations, requiring real-time capabilitiesintroduction of multiple GPGPUs in scanners for computational horsepower
42 Seismic Filtering -- motivation for compression: In our previous work when working on seismic filtering, transfer time originally 2% of overall time After off-loading filtering to GPU, now transfer time 90% of overall! Seismic filtering: 1) Transfer data, 2) actual filtering
43 Motivation Locality & I/O challenge for data intensive algorithms Look at techniques for reducing Mem. Bandwidth Hardware: HDD, SSD Compression: JPEG, MPEG, MP3... Explore GPU compression capabilities Seismic filtering process Transform coding works well for signal data * * [H.S.Malvar 1992], [L.C.Duval 2000], [C.Larsen 2006], [D.Haugen 2009]
44 Seismic Data 3D A collection of floats SGY format Traces Statistical variance Constructed datasets for testing
45 Results GPU acceleration
46 Results I/O Speedup
47 Visual Results
48 Results - compression When optimizing for I/O need efficent compression rate AND fast compression algorithm Compression can give up to: 6.2 I/O speedup on HDD (70MB/s) 3.9 I/O speedup on SSD (140MB/s) Achieved through Transform coding CPU & GPU co-op Asynch I/O Predictive model accurate within 5% Seismic compression library
49 3D Physics Viz Thomas Falch PhD student Mtech thesis: Elster main advisor, Dag Breiby,(Physics) co-advisor 3D Visualization of X-ray Diffraction Data
50 Heterogeneous Framework for Medical Image Processing and Visualization
51 Heterogeneous Framework for Medical Image Processing and Visualization
52 Heterogeneous Framework for Medical Image Processing and Visualization Challenges: - Portable. Both code and performance - Scheduling/distributing work to devices - Reducing memory transfer overhead - Programmability Make easy to use for non-experts Allow experts to do hand tuning/optimization
53 TACC/Univ. of Texas at Austin s Stampede http://www.tacc.utexas.edu/stampede
54 Current Related EU Activity EU COST Action IC0805: Open European Network for High Performance Computing on Complex Environments (2009-2013)" www.complexhpc.org"
55