Dr. Anne C. Elster. Assoc. Prof., HPC-Lab, Dept. Computer and Info. Science Norwegian University of Science & Technology Trondheim, Norway.

Size: px
Start display at page:

Download "Dr. Anne C. Elster. Assoc. Prof., HPC-Lab, Dept. Computer and Info. Science Norwegian University of Science & Technology Trondheim, Norway."

Transcription

1 1 The Power of Medical Imaging on GPUs Dr. Anne C. Elster Assoc. Prof., HPC-Lab, Dept. Computer and Info. Science Norwegian University of Science & Technology Trondheim, Norway and Visiting Scientist, ECE, University of Texas at Austin, USA

2 2 Thank yous to: My Collaborators, Post Docs and graduate students! Drs. Frank Lindseth (SINTEF Med Tech) & Prof. Bjørn Angelsen, NTNU Med School, Dept. of Circulation & Medical Imaging

3 3 Thank yous to: SC 07 My Post Docs and 06/07:Spring 2007 graduate students! 09/10:Spring /09:Spring 2009 SC 10 11/12:Spring 2012 A.C. Elster: The Power of Medical Imaging on GPU

4 4 NTNU Gløshaugen (formerly Norwegian Institute of Technology) U of Texas at Austin

5 5 Trondheim, Norway on the world map 5 A.C. Elster: The Power of Medical Imaging on GPU

6 6 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary

7 7 MOTIVATION The Power of Medical Imaging : Use Ultrasound, MRI, PET etc Imaging for: medical diagnostics (avoid exploratory surgery) image-guided surgery ++

8 8 The Power of Medical Imaging : Use Ultrasound, MPI, PetScans Imaging to: diagnose (avoid exploratory surgery) image-guided surgery ++ By harnessing the compute-power of GPUs!

9 9 Motivation GPU Computing: ModMany advances in processor designs are driven by Billion $$ gaming market! ern GPUs (Graphic Processing Unit) offer lots of FLOPS per watt!.. and lots of parallelism! NVIDA Tesla 2050/2070 (Fermi): 448 CUDA cores! - Kepler: - GTX 690 and Tesla K10 cards - have 3072 (2x1536) cores!

10 10 Heterogenous supercomputing China s Tianhe-1A No. 1 Supercomputer (SC 10)- NUDT/NSCC/Tianjin NUDT 6-core Intel X GHz + NVIDIA Tesla M2050 GPU Custom interconnect, 183,368 Cores Pflop/s China s Nebulae -- No.2 (ISC 10)/ No. 3 (SC 10) At National Supercomputing Centre in Shenzhen, China - Dawning TC3600 Blades w/intel X GHz + (4640) Nvidia Tesla C2050 GPUs - Theoretical peak performance at 2.98 PFlop/s - Linpack performance of PFlop/s

11 11 NTNU GPU Activities Elster s HPC-lab has graduated 25+ Master students (diplom) in GPU computing ( ) Currently supervising 8+PhD students & 9 master studs. NTNU designated NVIDIA CUDA Teaching Center (summer 2011) PhD seminar course (Spring 2013: 7 students) Master s level course (Fall 2012: 14 students) Senior Parallel Computing class Fall 2010: 43 taking exam Fall 2012: 57 students NVIDIA CUDA Research Center (2012)

12 12 HPC-Lab History (last 8 yrs): Fall 2006: First 2 student projects with GPU programming (Cg) Christian Larsen (MS Fall Project, December 2006): Utilizing GPUs on Cluster Computers (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare Elster as head of Computational Science & Visualization program and helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) 12

13 13 HPC-Lab History (contin.): 2007: Erik Axel Nielsen (Masters thesis, June 2007): Real-time Wavelet Filtering on the GPU -- joint project with GE Healthcare. 40 times GPU speedup of algorithm led to our implementation being adopted the same fall in their high-end cardivascular ultrasound scanner. Christian Larsen (Masters thesis, June 2007) Tore Fevang, Schlumberger (co-advisor): "Framework for Polygonial Structures Computations on Clusters (incl GPU parallelization) Idar Borlaug (Masters thesis, June 2007): Seismic Processing Using Parallel 3D FMM Thibault Collet (Masters thesis summer 2007): "Massively Online Games with Food Chains" Knut Imar Hagen (Masters thesis, June 2007) Fault-tolerance for MPI Codes on Computation Clusters (joint project with Statoil) Nils Magnus Larsgård (Masters thesis summer 2007): Framework for Converting MPI Codes to Hybrid OpenMP/MPI Codes 13 A.C. Elster: The Power of Medical Imaging on GPU

14 14 HPC-Lab History (contin.): 2008: Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) 14

15 15 HPC-Lab History (contin.): 2008: HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) Rune Hovland (Masters project, Dec 2008) : "Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster) Daniele Giuseppe Spampinato (Masters Project, December 2008): "Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster) Atle Rudshaug (Masters thesis, June 2008): Optimizing & Parallelizing a Large Commercial Code for Modeling Oil-well Networks -- joint project with Yggdrasil Andreas Bach (Masters thesis, September 2008): Profiling and Optimizing a Seismic Application on Modern Architectures -- joint project with Statoil 15

16 16 HPC-Lab History (contin.): 2009: NVIDIA Tesla s1070 (4 GPUs 960 cores * 1.44GHz, 4TF) Two NVIDIA Quadro FX 5800 cards (Jan 09), NVIDIA Ion (Jun 09) Two AMD/ATI Radon 5870 ( MHz, 2.72TF) (one donated by AMD) Note: Memory vs. Proc clocks E.g. NVIDIA s1070(-500): 792MHz vs 1.44GHz 16

17 17 HPC-Lab History (contin.): 2008: HPC-LAB at IDI/NTNU opens in Oct. with several NVIDIA donation Several quad-core machines (1-2 donated by Schlumberger) Atle Rudshaug (Masters thesis, June 2008): Optimizing & Parallelizing a Large Commercial Code for Modeling Oil-well Networks -- joint project with Yggdrasil Andreas Bach (Masters thesis, September 2008): Profiling and Optimizing a Seismic Application on Modern Architectures -- joint project with Statoil Rune Hovland (Masters project, Dec 2008) : "Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster) Daniele Giuseppe Spampinato (Masters Project, December 2008): "Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster) 17

18 18 Selected Master theses and Master reports supervised by Dr. Elster in ) Robin Eidissen (Masters thesis, January 2009) : "Utilizing GPUs for Real-Time Visualization of Snow SC 08-SC 10) Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) : "GPU Techniques for Porous Rock Visualization 2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) : "Techniques and Tools for Optimizing Codes on Modern Architectures: A Low-Level Approach (NR MS Thesis Award!) 3) Rune Johan Hovland (Masters thesis, June 2009), Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs A.C. Elster: The Power of Medical Imaging on GPU

19 19 Selected Master theses and Master reports supervised by Dr. Elster in ) Robin Eidissen (Masters thesis, January 2009) : "Utilizing GPUs for Real-Time Visualization of Snow SC 08-SC 10) Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) : "GPU Techniques for Porous Rock Visualization 2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) : "Techniques and Tools for Optimizing Codes on Modern Architectures: A Low-Level Approach (NR MS Thesis Award!) 3) Rune Johan Hovland (Masters thesis, June 2009), Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs 4) Henrik Hesland (Masters thesis, June 2009) Thorvald Natvig (co-advisor): "GPU-Enabled Interactive Pore Detection for 3D Rock Visualization " 5) Eirik Ola Aksnes (Masters thesis, July 2009) Ståle Fjeldstand & Atle Rudshaug, Numerical Rocks (co-advisors): "Simulation of Fluid Flow Through Porous Rocks on Modern GPUs" (ParCo 2009) 6) Daniel Haugen (Masters thesis, July 2009) Tore Fevang, Schlumberger (co-advisor): "Seismic Data Compression and GPU Memory Latency" 7) Åsmund Herikstad (Masters thesis, July 2009) Svein-Erik Måsøy, MedTek, NTNU (co-advisor) "Parallel Techniques for Estimation and Correction of Aberration in Medical Ultrasound Imaging" 8) Owe Johansen (Masters thesis, July 2009) John Hybertsen & Jon André Haugen, Statoil (coadvisors): "Seismic Shot Processing on GPU" 9) Daniele Giuseppe Spampinato (Masters thesis, July 2009; currently PhD ETH) "Modeling Communication on Multi-GPU Systems (ParCo 2009) A.C. Elster: The Power of Medical Imaging on GPU

20 20 HPC-Lab History (contin.): 2010: - NVIDIA Fermi-based card(470, c2050, c2070(fall)) - More on OpenCL Ahmed A. Aqwari (Masters thesis, June 2010): Effects of Compression on Data Intensive Algorithms Aleksander Gjermundsen (Masters thesis, July 2010): Audio Processing on GPU Andreas Hysing (Masters thesis, Aug 2010): Parallel Inversion code (w/statoil) Øystein Krog (Masters thesis, June 2010): GPU-based Real-Time Snow Avalanche Simulations Holger Ludvigsen (Masters thesis, June 2010, Dr. Frank Lindseth (co-advisor): Real-Time GPU-Based 3D Ultrasound Reconstruction and Visualization Thorvald Natvig (PhD Dec 2010) Automatic Run-Time Communication and I/O 20

21 21 HPC-Lab Masters Theses Spring 2011 Fredrik Fossum, MTech 2011 Real-Time Rigid Body Interactions (on GPU) Yngve S. Lindal, MTech 2011 CERN w/ Sverre Jarp (CTO, CERN), co-advisor: Optimizing a High-Energy Physics (HEP) Toolkit on Heterogeneous Architectures. Bent Ove Stinessen, MTech 2011 Dr. Alf Birger Rustad (Statoil Research, co-advisor): Profiling, Optimization and Parallelization of Seismic Inversion Code. Jarle Steinsland, MTech2011 Auto-tunable GPU BLAS Thor Kristian Valderhaug, MTech 2011 The Lattice Boltzmann Simulation on Multi-GPU Systems. Erik Smistad, Integrated MTech/PhD Main advisor: Frank Lindseth (Elster is co-advisor) Medical imaging on GPUs Hallgeir Lien (Master fall proj) co-supervized with Dr. Jo Skjermo, Vegvesenet -- Road Generation Using A* Algorithm continued this fall by another student

22 22 Master students that finished Summer 2012: Kjetil Babington, MTech 2012: Terrain Rendering Techniques for the HPC-Lab Snow Simulator Thomas Falch, MTech: (Elster main advisor, Dag Breiby,(Physics) co-advisor 3D Visualization of X-ray Diffraction Data Geir Josten Lien, Mscience: Auto-tunable GPU BLAS Supervised Ca. 50 master students of which ca 25 on GPU topics Jan Rovde Realt-Time Granular Flow Simulations Using the PCISHP Method on GPGPU Devices using CUDA Frederik MJ Vestre Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL on Mobile Platforms Johannes Kvam, MTech Cybernetics, (Elster is co-advisor, Main advisor: Prof Angelsen) Mediacal Image Processing w/ GPUs

23 23 Anne C. Elster Lab Director Rune E. Jensen PhD Students: Erik Smistad (Elster co-advisor Linseth, main advisor)) Johannes Kvam, (Elster is co-advisor, Main advisor: Prof Angelsen) Thomas Falch Mehdi Bozorgi (Elster co-advisor Linseth, main advisor)) Ivar Ursin Nikolaisen (Co-advisor. Alf B. Rustad, Statoil) Lane Holloway (Univ. of TX Austin, USA), Elster (de facto co-advisor/ committee mbr, Don Fussel, UT Comp- Sci. UT (main advisor) Samira Pakdel + NN & NN Post Doc? Master Students: Recent PhDs: Lars Melhus Henrik Knutsen Lars Espen Nordhus Stian Pedersen Magnus Mikalsen Andreas Skomedal Andreas Nordahl + Lars Martin Petersen & Elisabeth Solheim Jan Christian Meyer (PhD 2012) Selected Affiliates /Visitors Drs. Frank Lindseth (SINTEF Med Tech) & Prof. Bjørn Angelsen, NTNU Med School, Ruben Spaans Grant Strong Dept. of Circulation & Medical Imaging A.C. Elster: ThePhD Power of Medical on GPU applicant PhDImaging stud. Canda Miguel Amor Thorvald Natvig GTC S3061, March 2013 PhD stud. Spain (PhD.20, 2010)

24 24 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary

25 25 3D Ultrasound Reconstruction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, MS students: Holger Ludvigsen (CUDA) and Thor K. Valderhaug (OpenCL on AMD)

26 26 Ultrasound 3D Reconstruction Challenges: Calculate 64 million voxels from ca 400 b-scans Used during surgery, so real-time reconstruction is very important Keep costs down

27 27 Ultrasound 3D Reconstruction Solution: GPU acceleration! VNN Algorithm: Fill plane points Transform plane points Fill plane equation For each Voxel: Find closest plane Project into plane Find 2D coord of projection on plane Fill Voxel Achieved reconstruction 1.29 sec time vs sec on CPU!

28 28 Real-time Ultrasound 3D reconstruction multiple views

29 29 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Current related projects at HPC-Lab at NTNU Summary

30 30 3D Surface Extraction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and PhD student Erik Smistad

31 31 3D Surface Extraction on GPUs Use Marching Cubes algorithm for extracting a 3D surface from a set of sampled scalars Algorithm used extensively for visualizing and analyzing medical data (X-ray, MR) and the result of 3D segmentation. Completely data parallel Challenge: How to store the result of each cube in parallel on GPU

32 32 3D Surface Extraction -- Histogram data Challenge: How to store the result of each cube in parallel on GPU? In serial implementation this is simple just use a stack and add the vertex data to the stack GPU Solution: Histogram Pyramids [1] A datastructure that: Filters out cubes that has no triangle (stream reduction) Returns total sum of triangles Provides each cube with an index for memory storage Can be efficiently used by means of textures yielding large speed-ups [1] G. Ziegler et al: On-the-fly Point Clouds through Histogram Pyramids; Vision, Modeling, and Visualization 2006

33 33 3D Surface Extraction -- Histogram Pyramids: Construction & Traversal HP Construction HP Traversal

34 34 3D Surface Extraction -- Results: HPMC Dyken et al. vs. Our OpenCL implementation Size Exec. time FPS (avg) Memory 512^ ms MB 256^3 5 ms MB 128^3 3 ms MB 64^3 2 ms MB Size Exec. time FPS (avg) Memory 512^3 34 ms MB 256^3 10 ms MB 128^3 4 ms MB 64^3 3 ms MB Our Test system: Intel i5 750, 4GB RAM ATI Radeon 5870 (1GB RAM) AMD Catalyst 11.2 graphics driver APP SDK 2.3 w/ OpenCL 1.1 Note: OpenCL-OpenGL Synch measured to be 2-20ms, i.e <% for smallest datasets

35 35 3D Surface Extraction (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and PhD student Erik Smistad

36 36 Outline Motivation and brief intro to HPC-Lab at NTNU 3D Ultrasound Reconstruction 3D Surface Extraction GPU-Based Airway Segmentation and Centerline Extraction for Image Guided Bronchoscopy Related & Current projects at HPC-Lab at NTNU Summary

37 37 GPU-Based Airway Tree Segmentation and Centerline Extraction Erik Smistad PhD Student title=file%3aright_bronchial_tree.ogg

38 38 GPU-Based Airway Tree Segmentation and Centerline Extraction Erik Smistad PhD Student

39 39 GPU Accelerated Segmentation and Centerline Extraction of Tubular Structures from Medical Images Dataset GPU Runtime CPU Runtime Patient 1 46 secs 12min 52 secs Patient 2 49 secs 14 min 43 secs Patient 3 49 secs 10 min 44 secs Patient 4 45 secs 14 mins 4 secs Patient 5 33 secs 10 mins 5 secs Patient 6 60 secs 17 mins 25 secs NVIDIA Tesla C2070 GPU vs. one Intel i7 720 CPU with 4 cores.

40 40 GPU Accelerated Segmentation and Centerline Extraction of Tubular Structures from Medical Images

41 41 Surf tech and drug delivery Bjørn Angelsen, Johannes Kvam ++ Advanced ultrasound signal processing techniques Complex calculations, requiring real-time capabilitiesintroduction of multiple GPGPUs in scanners for computational horsepower

42 42 Seismic Filtering -- motivation for compression: In our previous work when working on seismic filtering, transfer time originally 2% of overall time After off-loading filtering to GPU, now transfer time 90% of overall! Seismic filtering: 1) Transfer data, 2) actual filtering

43 43 Motivation Locality & I/O challenge for data intensive algorithms Look at techniques for reducing Mem. Bandwidth Hardware: HDD, SSD Compression: JPEG, MPEG, MP3... Explore GPU compression capabilities Seismic filtering process Transform coding works well for signal data * * [H.S.Malvar 1992], [L.C.Duval 2000], [C.Larsen 2006], [D.Haugen 2009]

44 44 Seismic Data 3D A collection of floats SGY format Traces Statistical variance Constructed datasets for testing

45 45 Results GPU acceleration

46 46 Results I/O Speedup

47 47 Visual Results

48 48 Results - compression When optimizing for I/O need efficent compression rate AND fast compression algorithm Compression can give up to: 6.2 I/O speedup on HDD (70MB/s) 3.9 I/O speedup on SSD (140MB/s) Achieved through Transform coding CPU & GPU co-op Asynch I/O Predictive model accurate within 5% Seismic compression library

49 49 3D Physics Viz Thomas Falch PhD student Mtech thesis: Elster main advisor, Dag Breiby,(Physics) co-advisor 3D Visualization of X-ray Diffraction Data

50 50 Heterogeneous Framework for Medical Image Processing and Visualization

51 51 Heterogeneous Framework for Medical Image Processing and Visualization

52 52 Heterogeneous Framework for Medical Image Processing and Visualization Challenges: - Portable. Both code and performance - Scheduling/distributing work to devices - Reducing memory transfer overhead - Programmability Make easy to use for non-experts Allow experts to do hand tuning/optimization

53 53 TACC/Univ. of Texas at Austin s Stampede

54 54 Current Related EU Activity EU COST Action IC0805: Open European Network for High Performance Computing on Complex Environments ( )"

55 55

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

Latency and Bandwidth Impact on GPU-systems

Latency and Bandwidth Impact on GPU-systems NTNU Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science TDT4590 Complex Computer Systems,

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015 The GPU Accelerated Data Center Marc Hamilton, August 27, 2015 THE GPU-ACCELERATED DATA CENTER HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 Product design FROM ADVANCED RENDERING TO VIRTUAL

More information

GPU-based Decompression for Medical Imaging Applications

GPU-based Decompression for Medical Imaging Applications GPU-based Decompression for Medical Imaging Applications Al Wegener, CTO Samplify Systems 160 Saratoga Ave. Suite 150 Santa Clara, CA 95051 sales@samplify.com (888) LESS-BITS +1 (408) 249-1500 1 Outline

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

HPC-related R&D in 863 Program

HPC-related R&D in 863 Program HPC-related R&D in 863 Program Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Aug. 27, 2010 Outline The 863 key project on HPC and Grid Status and Next 5 years 863 efforts on

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

Interactive Level-Set Segmentation on the GPU

Interactive Level-Set Segmentation on the GPU Interactive Level-Set Segmentation on the GPU Problem Statement Goal Interactive system for deformable surface manipulation Level-sets Challenges Deformation is slow Deformation is hard to control Solution

More information

Pedraforca: ARM + GPU prototype

Pedraforca: ARM + GPU prototype www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

GPU Programming in Computer Vision

GPU Programming in Computer Vision Computer Vision Group Prof. Daniel Cremers GPU Programming in Computer Vision Preliminary Meeting Thomas Möllenhoff, Robert Maier, Caner Hazirbas What you will learn in the practical course Introduction

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Petascale Visualization: Approaches and Initial Results

Petascale Visualization: Approaches and Initial Results Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

The Future Of Animation Is Games

The Future Of Animation Is Games The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director cwang@1-apple.com.tw The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores

More information

Parallel Computing. Introduction

Parallel Computing. Introduction Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer GU Computing 1 2 3 The GU Advantage To ExaScale and Beyond The GU is the Computer The GU Advantage The GU Advantage A Tale of Two Machines Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World s

More information

High Performance GPGPU Computer for Embedded Systems

High Performance GPGPU Computer for Embedded Systems High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not

More information

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild. Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

1. INTRODUCTION Graphics 2

1. INTRODUCTION Graphics 2 1. INTRODUCTION Graphics 2 06-02408 Level 3 10 credits in Semester 2 Professor Aleš Leonardis Slides by Professor Ela Claridge What is computer graphics? The art of 3D graphics is the art of fooling the

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

GPU Computing - CUDA

GPU Computing - CUDA GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective

More information

10- High Performance Compu5ng

10- High Performance Compu5ng 10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

How to program efficient optimization algorithms on Graphics Processing Units - The Vehicle Routing Problem as a case study

How to program efficient optimization algorithms on Graphics Processing Units - The Vehicle Routing Problem as a case study How to program efficient optimization algorithms on Graphics Processing Units - The Vehicle Routing Problem as a case study Geir Hasle, Christian Schulz Department of, SINTEF ICT, Oslo, Norway Seminar

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

MapGraph. A High Level API for Fast Development of High Performance Graphic Analytics on GPUs. http://mapgraph.io

MapGraph. A High Level API for Fast Development of High Performance Graphic Analytics on GPUs. http://mapgraph.io MapGraph A High Level API for Fast Development of High Performance Graphic Analytics on GPUs http://mapgraph.io Zhisong Fu, Michael Personick and Bryan Thompson SYSTAP, LLC Outline Motivations MapGraph

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

NVIDIA Jetson TK1 Development Kit

NVIDIA Jetson TK1 Development Kit Technical Brief NVIDIA Jetson TK1 Development Kit Bringing GPU-accelerated computing to Embedded Systems P a g e 2 V1.0 P a g e 3 Table of Contents... 1 Introduction... 4 NVIDIA Tegra K1 A New Era in Mobile

More information

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

High Productivity Computing With Windows

High Productivity Computing With Windows High Productivity Computing With Windows Windows HPC Server 2008 Justin Alderson 16-April-2009 Agenda The purpose of computing is... The purpose of computing is insight not numbers. Richard Hamming Why

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

GPU Point List Generation through Histogram Pyramids

GPU Point List Generation through Histogram Pyramids VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator

More information

GPU Usage. Requirements

GPU Usage. Requirements GPU Usage Use the GPU Usage tool in the Performance and Diagnostics Hub to better understand the high-level hardware utilization of your Direct3D app. You can use it to determine whether the performance

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

IP Video Rendering Basics

IP Video Rendering Basics CohuHD offers a broad line of High Definition network based cameras, positioning systems and VMS solutions designed for the performance requirements associated with critical infrastructure applications.

More information

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI Di Zhao The Ohio State University GPU Technology Conference 2014, March 24-27 2014, San Jose California 1 TEGRA-WIFI-GEFORCE

More information

walberla: A software framework for CFD applications

walberla: A software framework for CFD applications walberla: A software framework for CFD applications U. Rüde, S. Donath, C. Feichtinger, K. Iglberger, F. Deserno, M. Stürmer, C. Mihoubi, T. Preclic, D. Haspel (all LSS Erlangen), N. Thürey (LSS Erlangen/

More information

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,

More information

An introduction to Fyrkat

An introduction to Fyrkat Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of

More information

Hardware Acceleration for CST MICROWAVE STUDIO

Hardware Acceleration for CST MICROWAVE STUDIO Hardware Acceleration for CST MICROWAVE STUDIO Chris Mason Product Manager Amy Dewis Channel Manager Agenda 1. Introduction 2. Why use Hardware Acceleration? 3. Hardware Acceleration Technologies 4. Current

More information