Fast Parallel Algorithms for Computational Bio-Medicine



Similar documents
walberla: A software framework for CFD applications

walberla: A software framework for CFD applications on Compute Cores

Generating Virtual Worlds with Supercomputer Simulations

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Petascale Visualization: Approaches and Initial Results

Optimizing Performance of the Lattice Boltzmann Method for Complex Structures on Cache-based Architectures

Towards real-time image processing with Hierarchical Hybrid Grids

Optimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries

Turbomachinery CFD on many-core platforms experiences and strategies

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Interactive Level-Set Segmentation on the GPU

HPC enabling of OpenFOAM R for CFD applications

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

GPU Point List Generation through Histogram Pyramids

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

High Performance Computing in CST STUDIO SUITE

Real Time Simulation of Power Plants

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Simulation Platform Overview

Accelerating CFD using OpenFOAM with GPUs

Interactive Level-Set Deformation On the GPU

Introduction to GPU Computing

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

and RISC Optimization Techniques for the Hitachi SR8000 Architecture

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

HPC Deployment of OpenFOAM in an Industrial Setting

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

From Big Data to Smart Data Thomas Hahn

Retargeting PLAPACK to Clusters with Hardware Accelerators

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Advanced Volume Rendering Techniques for Medical Applications

Trends in High-Performance Computing for Power Grid Applications

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Stream Processing on GPUs Using Distributed Multimedia Middleware

Introduction to Cloud Computing

OpenMP Programming on ScaleMP

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

GPUs for Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

Pedraforca: ARM + GPU prototype

~ Greetings from WSU CAPPLab ~

Large-Scale Reservoir Simulation and Big Data Visualization

Project Discussion Multi-Core Architectures and Programming

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Parallel Programming Survey

High Performance Computing in the Multi-core Area

Hardware Acceleration for CST MICROWAVE STUDIO

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

Recent Advances in HPC for Structural Mechanics Simulations

Supercomputing Resources in BSC, RES and PRACE

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Evaluation of CUDA Fortran for the CFD code Strukti

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Next Generation Operating Systems

White Paper The Numascale Solution: Extreme BIG DATA Computing

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

PARTNER SEARCH FORM. r. Akiva 34/22. Modiin Illit. Israel Fax. Avtech Scientific

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

1. INTRODUCTION Graphics 2

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Wind-Tunnel Simulation using TAU on a PC-Cluster: Resources and Performance Stefan Melber-Wilkending / DLR Braunschweig

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Clusters: Mainstream Technology for CAE

Microsoft Windows Compute Cluster Server 2003 Evaluation

Computer Graphics Hardware An Overview

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPU Programming in Computer Vision

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Altix Usage and Application Programming. Welcome and Introduction

10- High Performance Compu5ng

Formula One - The Engineering Race Evolution in virtual & physical testing over the last 15 years. Torbjörn Larsson Creo Dynamics AB

Parallel Simplification of Large Meshes on PC Clusters

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Transcription:

Fast Parallel Algorithms for Computational Bio-Medicine H. Köstler, J. Habich, J. Götz, M. Stürmer, S. Donath, T. Gradl, D. Ritter, D. Bartuschat, C. Feichtinger, C. Mihoubi, K. Iglberger (LSS Erlangen) U. Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Informatik 10 (Systemsimulation) Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de 25. Juni 2009 1

Intro: Who we are Overview Example 1: Towards Near Real Time Simulation of Blood Flow on the IBM Cell Processor Example 2: Parallel Numerical Algorithms for Advanced Image Processing Conclusions 2

Introduction 3

The LSS Mission Development and Analysis of Computer Methods for Applications in Science and Engineering Applications from Physical and Engineering Sciences LSS Computer Science Mathematics 4

What We Do Simulation Applications in Science and Engineering Material Sciences Bio-medical Models and Simulation Chemical Engineering Computational Fluid Dynamics Electrical Engineering Power Systems Medical Imaging Algorithms for Numerical Simulation Parallel Iterative Solvers (Multigrid) Parallel Lattice Boltzmann Methods High Performance Computing Supercomputer Architecture Architecture-Aware and Parallel Programming, Cache Optimization Software Engineering for Technical Applications 5

C. Feichtinger S. Donath C. Mihoubi P. Neumann D. Geuss T. Dreher T. Preclik D. Bartuschat B. Heumann M. Wohlmuth Who is at LSS (and does what?) Complex Flows K. Iglberger J. Götz D. Haspel S. Ganguly F. Aristizabal Numerical Algorithms H. Köstler V. Buwa Alumni C. Popa Laser Simulation Prof. Dr. C. Pflaum C. Jandl R. Azif T. Gradl M. Stürmer F. Deserno D. Ritter Prof. G. Horton (Univ. of Magdeburg) Prof. El Mostafa Kalmoun (Cadi Ayyad University, Marocco) Dr. M. Kowarschik (Siemens Health Care) Dr. M Mohr (Geophysik, TU München) Dr. F. Hülsemann (EDF, Paris) Dr. B. Bergen (Los Alamos, USA) Dr. N. Thürey (ETZH Zürich) Dr. J. Härdtlein (Bosch GmbH) C. Möller (Navigon) Dr. U. Fabricius (Elektrobit) Dr. Th. Pohl (Siemens Health Care) J. Treibig (RRZE) C. Freundl Supercomputing J. Götz B. Gmeiner 6

Industrial Collaboration Projects With Siemens Industry Dissertation T. Dreher (Siemens Simulation Center) Temperature Simulation for Rolling Process Control Energy MSc Thesis Yongqi Sun (Power Plant Modelling) Corporate Technology MSc Thesis Rishi Patil (Coal Gasification) Health Care Dr. Camnpagna (MR reconstruction with GPUs) Dr. Pohl (GPU accelerated image processing) Dr. Kowarschik (Component Evaluation) Other industrial collaborations Areva (Thermohydraulics) Opel (Fuel Cells) BASF (materials processing) 7

Example 1 Towards Near Real Time Flow Simulation on the IBM Cell for Bio-Medical Applications 8

LSS Cluster 8x4 (+ 9x2) Nodes CPU: AMD Opteron 848 RAM: 8x16 Gbyte High-Speed Network InfiniBand Computing Equipment Erlangen Computing Center (RRZE) Cluster 217 compute nodes each with two Xeon 5160 "Woodcrest" chips (4 cores) High-Speed Network InfiniBand Bavarian Academy of Sciences (LRZ) HLRB-II Supercomputer SGI Altix 4700 9728 Itanium Cores, 62 TFlops 40 Tbyte memory 9

Simulation chain 10

Is Flow Simulation Possible in Real Time?... at acceptable cost? not today! But... Commercial CFD Software has tremendous overheads Grid generation/ interface to geometry description General purpose vs specialised solution Existing CFD software does not exploit modern (multi-core hardware) 4-D data sets (moving geometry)? 11

IBM Cell Processor Available cell systems: Roadrunner Blades Playstation 3 12

Cell Architecture: 9 cores on a chip 13

The Lattice-Boltzmann-Method An alternative to Finite Volumes or Finite Elements Discretization in cubes (cells) 9 (or 19) numbers per cell = number of particles traveling towards neighboring cells Repeat (many times) stream collide 14

The stream step Move particle (numbers) into neighboring cells 15

The collide step Compute new particle numbers according to the collisions 16

Results Velocity near the wall in an aneurysm Oscillatory shear stress near the wall in an aneurysm 17

Simulation of clotting processes using non-newtonian blood models Motivation Knowledge about the clotting of blood is important for many medical applications: Diagnostics Surgery 18

Aging Model for blood clotting Simulation of clot formation within the aneurysm 19

Performance Results 50,0 LBM performance on a single core (8x8x8 channel flow) 49,0 37,5 25,0 12,5 0 10,4 4,8 2,0 Xeon 5160 PPE SPE* straight-forward C code SIMD-optimized assembly *on Local Store without DMA transfers 20

100,0 Performance Results 93 94 94 95 82,5 81 65,0 47,5 42 30,0 1 2 3 4 5 6 21

Performance Results 50,0 43,8 37,5 25,0 21,1 12,5 9,1 11,7 0 Xeon 5160* Playstation 3 1 core 1 CPU *performance optimized code by LB-DC 22

Example 2 Parallel Multilevel Image Processing Algorithms (H. Köstler) 23

What we work on Goals: deal with variational models in medical image processing and computer vision develop a software framework for their efficient numerical treatment by multigrid methods use common parallel hardware and fast numerical algorithms in order to achieve real-time processing Applications: image denoising, image inpainting, image compression, image segmentation optical flow, image registration, motion blur 24

25

Image Denoising Result: 3D CT Volume Figure: Noisy image slice of a 3D CT volume (data: Siemens AG, Healthcare Sector) and one slice of denoised image [MBK + 07]. 26

Image Registration Result I (a) Reference image (b) Template image Figure: Reference and template image of an abdominal scan (data: Prof. Dr. med. K.-U. Eckardt, Nephrologie und Hypertensiologie, Universitätsklinikum Erlangen). 27

Image Registration Result II (a) Registered Images (b) Deformation field Figure: Registration of abdominal scan using an elastic regularizer (data: Prof. Dr. med. K.-U. Eckardt, Nephrologie und Hypertensiologie, Universitätsklinikum Erlangen). 28

Image processing at LSS Variational models in image processing (non-rigid registration, etc.) + some compressed sensing algorithms Fast (multilevel algorithms) Parallel implementations Unconventional Hardware different GPU types (ATI vs. Nvidia) Cell systematic comparisons hybrid programming 29

Conclusions Computational models of bio-medical systems as future tools for diagnosis and therapy planning from research tools to products in clinical practice Advanced image processing algorithms based on parallel multilevel algorithms Multi-Core accelerators as cost-efficient performance booster 30