Towards real-time image processing with Hierarchical Hybrid Grids

Size: px
Start display at page:

Download "Towards real-time image processing with Hierarchical Hybrid Grids"

Transcription

1 Towards real-time image processing with Hierarchical Hybrid Grids International Doctorate Program - Summer School Björn Gmeiner Joint work with: Harald Köstler, Ulrich Rüde August, 2011

2 Contents The HHG Framework Image processing for MRI Real-time processing 2

3 The HHG Framework 3

4 Combining finite element and multigrid methods FE mesh may be unstructured. What nodes to remove for coarsening? Not straightforward! Why not start from the coarse grid? The Hierarchical Hybrid Grids (HHG) concept Benjamin Bergen*: prototype Tobias Gradl: tuning, extensions and adaptivity * Dissertation in Erlangen, ISC award in Currently at Los Alamos Labs. 4

5 Advantages Properties of the HHG approach Multigrid is straightforward Very memory efficient Massive performance benefits on current computer architectures Subserves parallelization unknowns are possible Limitation Coarse input grid needed Adaptivity (ongoing work by Tobias Gradl) 5

6 Two-grid cycle (correction scheme) 6

7 HHG Primitives (2d-example) inner points (macro) vertex points (macro) edge points ghost points communication 7

8 Weak scalability of HHG on Blue Gene/P (Jugene) Cores Struct. Regions Unknowns CG Time

9 Image processing for MRI 1. Denoising by homogeneous diffusion 2. High dynamic range compression 9

10 Domain generation (typical size: e.g ) 1. Static domain partitioning, parallel file reading 2. Find relevant (information containing) regions 3. Distribute only relevant regions equally 10

11 1) Denoising by homogeneous diffusion Image with noise: u 0 = Ru + η R... linear operator incorporating blur (we assume R = Id) η... additive noise (e.g. white Gaussian noise) Simplest approach to reduce noise (better: anisotropic Diffusion): u u 0 = α u α... regularization parameter (α > 0) Variational formulation: a(u, v) = α u + uv dx, f (v) = Ω Ω u 0 v dx 11

12 Denoising by homogeneous diffusion (cont.) min J(u) := 1 a(u, u) f (u) 2 min J(u) := 1 α u u + u 2 dx u 0 u dx 2 Ω Ω min 2J(u) = α u u + u 2 2u 0 u dx min 2J(u) = Ω Ω α u u + u 2 2u 0 u + (u 0 ) 2 (u 0 ) 2 dx min Ω u 0 u 2 + α u 2 dx 12

13 The HHG Framework Image processing for MRI Real-time processing 2) High dynamic range compression Steps 1. compute gradient field 2. manipulate picture in the gradient domain (i.e. damp large gradients) 3. back transformation u = k( u 0 ) 13

14 Real-time processing 14

15 Objective platforms Jugene (FZ Jülich) lima (RRZE Erlangen) 4-way SMP processor 32-bit PowerPC 450 core 850 MHz Bandwidth: 13.6 GB/s 2 GB main memory 2 hexa-core processors Xeon 5650 Westmere MHz Bandwidth: 32 GB/s 24 GB main memory 15

16 5-point stencil example: Blue-Gene/P 1 f o r ( i n t j =1; j <t s i z e 1; ++j ) { 2 // l e x. update ( a l l p o i n t s ) 3 f o r ( i n t i =1; i <t s i z e 1; ++i ) { 5 u [ k t s i z e t s i z e + j t s i z e + i ] = 6 c [ 0 ] ( f [ j t s i z e+i ] + 8 c [ 1 ] u [ ( j +1) t s i z e + ( i ) ] + 9 c [ 2 ] u [ ( j ) t s i z e + ( i +1) ] + 10 c [ 3 ] u [ ( j ) t s i z e + ( i 1) ] + 11 c [ 4 ] u [ ( j 1) t s i z e + ( i ) ] ) ; 12 } 13 } 16

17 Disjoint optimization : Blue-Gene/P 1 double u2 = u ; 2 f o r ( i n t j =1; j <t s i z e 1; ++j ) { 3 // f i r s t update ( r e d p o i n t s o n l y ) 4 f o r ( i n t i =1; i <t s i z e 1; i +=2) { 5 #pragma d i s j o i n t ( u, f ) 6 #pragma d i s j o i n t ( u, u2 ) 7 #pragma d i s j o i n t ( u2, f ) 9 u2 [ k t s i z e t s i z e + j t s i z e + i ] = 10 c [ 0 ] ( f [ j t s i z e+i ] + 12 c [ 1 ] u [ ( j +1) t s i z e + ( i ) ] + 13 c [ 2 ] u [ ( j ) t s i z e + ( i +1) ] + 14 c [ 3 ] u [ ( j ) t s i z e + ( i 1) ] + 15 c [ 4 ] u [ ( j 1) t s i z e + ( i ) ] ) ; 16 } 17 // second update ( b l a c k p o i n t s o n l y ) 18 } 17

18 7-point stencil (Blue-Gene/P) MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 18

19 27-point stencil (Blue-Gene/P) 10 8 MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint, index opt Size 19

20 Different stencils (Blue-Gene/P) MStencil/s point stencil 15-point stencil 27-point stencil Size 20

21 10 Strong scaling (Blue-Gene/P) Time per V-cycle [s] ,000 20,000 30,000 40,000 50,000 Number of Cores Figure: Strong Scaling behavior of HHG on PowerPC 450 cores. This test case was performed starting from 512 cores, solving DoF. 21

22 7-point stencil (1 core per node, Westmere) 300 MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 22

23 7-point stencil (12 core per node, Westmere) MStencil/s lex. Gauss-Seidel RRB Gauss-Seidel disjoint RRB Gauss-Seidel disjoint, index opt Size 23

24 Next steps / Outlook Parallel file reading Implementation of varying coefficients Nonlinear isotropic and anisotropic diffusion regularizers 24

25 Thank you for you attention! Any questions? The development of HHG was funded by the Elite Network of Bavaria within the International Doctorate Program Identification, Optimization and Control with Applications in odern Technologies KONWIHR 25

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,

More information

Fast Parallel Algorithms for Computational Bio-Medicine

Fast Parallel Algorithms for Computational Bio-Medicine Fast Parallel Algorithms for Computational Bio-Medicine H. Köstler, J. Habich, J. Götz, M. Stürmer, S. Donath, T. Gradl, D. Ritter, D. Bartuschat, C. Feichtinger, C. Mihoubi, K. Iglberger (LSS Erlangen)

More information

Dual Methods for Total Variation-Based Image Restoration

Dual Methods for Total Variation-Based Image Restoration Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony

More information

A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES

A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES A PARALLEL GEOMETRIC MULTIGRID METHOD FOR FINITE ELEMENTS ON OCTREE MESHES RAHUL S. SAMPATH AND GEORGE BIROS Abstract. In this article, we present a parallel geometric multigrid algorithm for solving elliptic

More information

A Multi-layered Domain-specific Language for Stencil Computations

A Multi-layered Domain-specific Language for Stencil Computations A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,

More information

Big Data Graph Algorithms

Big Data Graph Algorithms Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

CUDA for Real Time Multigrid Finite Element Simulation of

CUDA for Real Time Multigrid Finite Element Simulation of CUDA for Real Time Multigrid Finite Element Simulation of SoftTissue Deformations Christian Dick Computer Graphics and Visualization Group Technische Universität München, Germany Motivation Real time physics

More information

walberla: A software framework for CFD applications on 300.000 Compute Cores

walberla: A software framework for CFD applications on 300.000 Compute Cores walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, jan.goetz@cs.fau.de), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation)

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

YALES2 porting on the Xeon- Phi Early results

YALES2 porting on the Xeon- Phi Early results YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

Chapter 15: Distributed Structures. Topology

Chapter 15: Distributed Structures. Topology 1 1 Chapter 15: Distributed Structures Topology Network Types Operating System Concepts 15.1 Topology Sites in the system can be physically connected in a variety of ways; they are compared with respect

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

A Comparative Study of Conforming and Nonconforming High-Resolution Finite Element Schemes

A Comparative Study of Conforming and Nonconforming High-Resolution Finite Element Schemes A Comparative Study of Conforming and Nonconforming High-Resolution Finite Element Schemes Matthias Möller Institute of Applied Mathematics (LS3) TU Dortmund, Germany European Seminar on Computing Pilsen,

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

How High a Degree is High Enough for High Order Finite Elements?

How High a Degree is High Enough for High Order Finite Elements? This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications Dr. Bjoern Landmann Dr. Kerstin Wieczorek Stefan Bachschuster 18.03.2015 FluiDyna GmbH, Lichtenbergstr. 8, 85748 Garching

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

Divergence-Free Elements for Incompressible Flow on Cartesian Grids

Divergence-Free Elements for Incompressible Flow on Cartesian Grids Divergence-Free Elements for Incompressible Flow on Cartesian Grids Tobias Neckel, Marion Bendig, Hans-Joachim Bungartz, Miriam Mehl, and Christoph Zenger, Fakultät für Informatik TU München Outline The

More information

An Additive Neumann-Neumann Method for Mortar Finite Element for 4th Order Problems

An Additive Neumann-Neumann Method for Mortar Finite Element for 4th Order Problems An Additive eumann-eumann Method for Mortar Finite Element for 4th Order Problems Leszek Marcinkowski Department of Mathematics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland, Leszek.Marcinkowski@mimuw.edu.pl

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

Part II: Finite Difference/Volume Discretisation for CFD

Part II: Finite Difference/Volume Discretisation for CFD Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te Advection-Diffusion Equation A Finite Difference/Volume Metod for te Incompressible Navier-Stokes Equations Marker-and-Cell

More information

Computation of crystal growth. using sharp interface methods

Computation of crystal growth. using sharp interface methods Efficient computation of crystal growth using sharp interface methods University of Regensburg joint with John Barrett (London) Robert Nürnberg (London) July 2010 Outline 1 Curvature driven interface motion

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

COMPUTATIONAL ENGINEERING OF FINITE ELEMENT MODELLING FOR AUTOMOTIVE APPLICATION USING ABAQUS

COMPUTATIONAL ENGINEERING OF FINITE ELEMENT MODELLING FOR AUTOMOTIVE APPLICATION USING ABAQUS International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 7, Issue 2, March-April 2016, pp. 30 52, Article ID: IJARET_07_02_004 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=7&itype=2

More information

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 Center for Information Services and High Performance Computing (ZIH) Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 PARA 2010, June 9, Reykjavík, Iceland Matthias

More information

Dynamic Resolution Rendering

Dynamic Resolution Rendering Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

A Multigrid Tutorial part two

A Multigrid Tutorial part two A Multigrid Tutorial part two William L. Briggs Department of Matematics University of Colorado at Denver Van Emden Henson Center for Applied Scientific Computing Lawrence Livermore National Laboratory

More information

Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications

Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications CO-DESIGN 2012, October 23-25, 2012 Peing University, Beijing Hierarchically Parallel FE Software for Assembly Structures : FrontISTR - Parallel Performance Evaluation and Its Industrial Applications Hiroshi

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

TESLA Report 2003-03

TESLA Report 2003-03 TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Numerical Methods For Image Restoration

Numerical Methods For Image Restoration Numerical Methods For Image Restoration CIRAM Alessandro Lanza University of Bologna, Italy Faculty of Engineering CIRAM Outline 1. Image Restoration as an inverse problem 2. Image degradation models:

More information

High Performance Multi-Layer Ocean Modeling. University of Kentucky, Computer Science Department, 325 McVey Hall, Lexington, KY 40506-0045, USA.

High Performance Multi-Layer Ocean Modeling. University of Kentucky, Computer Science Department, 325 McVey Hall, Lexington, KY 40506-0045, USA. High Performance Multi-Layer Ocean Modeling Craig C. Douglas A, Gundolf Haase B, and Mohamed Iskandarani C A University of Kentucky, Computer Science Department, 325 McVey Hall, Lexington, KY 40506-0045,

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

C3.8 CRM wing/body Case

C3.8 CRM wing/body Case C3.8 CRM wing/body Case 1. Code description XFlow is a high-order discontinuous Galerkin (DG) finite element solver written in ANSI C, intended to be run on Linux-type platforms. Relevant supported equation

More information

High-fidelity electromagnetic modeling of large multi-scale naval structures

High-fidelity electromagnetic modeling of large multi-scale naval structures High-fidelity electromagnetic modeling of large multi-scale naval structures F. Vipiana, M. A. Francavilla, S. Arianos, and G. Vecchi (LACE), and Politecnico di Torino 1 Outline ISMB and Antenna/EMC Lab

More information

FEM Software Automation, with a case study on the Stokes Equations

FEM Software Automation, with a case study on the Stokes Equations FEM Automation, with a case study on the Stokes Equations FEM Andy R Terrel Advisors: L R Scott and R C Kirby Numerical from Department of Computer Science University of Chicago March 1, 2006 Masters Presentation

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Benchmark Tests on ANSYS Parallel Processing Technology

Benchmark Tests on ANSYS Parallel Processing Technology Benchmark Tests on ANSYS Parallel Processing Technology Kentaro Suzuki ANSYS JAPAN LTD. Abstract It is extremely important for manufacturing industries to reduce their design process period in order to

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Performance Evaluation of Amazon EC2 for NASA HPC Applications! National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!

More information

Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux

Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines et Univ. Paris VI joint work with Victorita Dolean

More information

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data

More information

Uniboard based digital receiver

Uniboard based digital receiver Uniboard based digital receiver G. Comoretto 1, A. Russo 1, G. Knittel 2 1- INAF Osservatorio di Arcetri 2- MPIfR Bonn Plus many others at Jive, Astron, Bordeaux, Bonn Summary Not really VLBI Pulsar timing:

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

HIGH ORDER WENO SCHEMES ON UNSTRUCTURED TETRAHEDRAL MESHES

HIGH ORDER WENO SCHEMES ON UNSTRUCTURED TETRAHEDRAL MESHES European Conference on Computational Fluid Dynamics ECCOMAS CFD 26 P. Wesseling, E. Oñate and J. Périaux (Eds) c TU Delft, The Netherlands, 26 HIGH ORDER WENO SCHEMES ON UNSTRUCTURED TETRAHEDRAL MESHES

More information

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Finite Elements for 2 D Problems

Finite Elements for 2 D Problems Finite Elements for 2 D Problems General Formula for the Stiffness Matrix Displacements (u, v) in a plane element are interpolated from nodal displacements (ui, vi) using shape functions Ni as follows,

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Edge Processing and Event Detection using Phasor Data. Raymond de Callafon and Sai Akhil Reddy

Edge Processing and Event Detection using Phasor Data. Raymond de Callafon and Sai Akhil Reddy Edge Processing and Event Detection using Phasor Data Raymond de Callafon and Sai Akhil Reddy University of California, San Diego & OSIsoft JSIS Meeting, April 26-28, Salt Lake City email: callafon@ucsd.edu

More information

ANALYSIS OF THE WEB, PROCESSOR SPEED AND BANDWIDTH GROWTH: IMPACT ON SEARCH ENGINE DESIGN

ANALYSIS OF THE WEB, PROCESSOR SPEED AND BANDWIDTH GROWTH: IMPACT ON SEARCH ENGINE DESIGN ANALYSIS OF THE WEB, PROCESSOR SPEED AND BANDWIDTH GROWTH: IMPACT ON SEARCH ENGINE DESIGN K. Satya Sai Prakash Network Systems Laboratory IIT Madras, Chennai - 636 India Phone: 91-44-22578355 ssai@acm.org

More information

Numerical Calculation of Beam Coupling Impedances in the Frequency Domain using the Finite Integration Technique

Numerical Calculation of Beam Coupling Impedances in the Frequency Domain using the Finite Integration Technique Numerical Calculation of Beam Coupling Impedances in the Frequency Domain using the Finite Integration Technique Uwe Niedermayer and Oliver Boine-Frankenheim 24 August 2012 TU Darmstadt Fachbereich 18

More information

Simulation of Fluid-Structure Interactions in Aeronautical Applications

Simulation of Fluid-Structure Interactions in Aeronautical Applications Simulation of Fluid-Structure Interactions in Aeronautical Applications Martin Kuntz Jorge Carregal Ferreira ANSYS Germany D-83624 Otterfing Martin.Kuntz@ansys.com December 2003 3 rd FENET Annual Industry

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Computational fluid dynamics (CFD) 9 th SIMLAB Course

Computational fluid dynamics (CFD) 9 th SIMLAB Course Computational fluid dnamics (CFD) 9 th SIMLAB Course Janos Benk October 3-9, Janos Benk: Computational fluid dnamics (CFD) www5.in.tum.de/wiki/inde.php/lab_course_computational_fluid_dnamics_-_summer_

More information

Back to Elements - Tetrahedra vs. Hexahedra

Back to Elements - Tetrahedra vs. Hexahedra Back to Elements - Tetrahedra vs. Hexahedra Erke Wang, Thomas Nelson, Rainer Rauch CAD-FEM GmbH, Munich, Germany Abstract This paper presents some analytical results and some test results for different

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

Algorithms and Tools for Scalable Graph Analy8cs. Kamesh Madduri

Algorithms and Tools for Scalable Graph Analy8cs. Kamesh Madduri Algorithms and Tools for Scalable Graph Analy8cs Kamesh Madduri Computer Science and Engineering The Pennsylvania State University madduri@cse.psu.edu MMDS 2012 July 13, 2012 This talk: A methodology for

More information

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:

More information

Capacity Management for Oracle Database Machine Exadata v2

Capacity Management for Oracle Database Machine Exadata v2 Capacity Management for Oracle Database Machine Exadata v2 Dr. Boris Zibitsker, BEZ Systems NOCOUG 21 Boris Zibitsker Predictive Analytics for IT 1 About Author Dr. Boris Zibitsker, Chairman, CTO, BEZ

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters

Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters Philip J. Sokolowski Dept. of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr., Detroit, MI 4822 phil@wayne.edu

More information

HPC Infrastructure Development in Bulgaria

HPC Infrastructure Development in Bulgaria HPC Infrastructure Development in Bulgaria Svetozar Margenov margenov@parallel.bas.bg Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev Str. Bl. 25-A,

More information

Introduction to the Finite Element Method

Introduction to the Finite Element Method Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

Introduction Our choice Example Problem Final slide :-) Python + FEM. Introduction to SFE. Robert Cimrman

Introduction Our choice Example Problem Final slide :-) Python + FEM. Introduction to SFE. Robert Cimrman Python + FEM Introduction to SFE Robert Cimrman Department of Mechanics & New Technologies Research Centre University of West Bohemia Plzeň, Czech Republic April 3, 2007, Plzeň 1/22 Outline 1 Introduction

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU

PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS JINGJIN WU PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS BY JINGJIN WU Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Arcane/ArcGeoSim, a software framework for geosciences simulation

Arcane/ArcGeoSim, a software framework for geosciences simulation Renewable energies Eco-friendly production Innovative transport Eco-efficient processes Sustainable resources Arcane/ArcGeoSim, a software framework for geosciences simulation Pascal Havé Outline these

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Paper Pulp Dewatering

Paper Pulp Dewatering Paper Pulp Dewatering Dr. Stefan Rief stefan.rief@itwm.fraunhofer.de Flow and Transport in Industrial Porous Media November 12-16, 2007 Utrecht University Overview Introduction and Motivation Derivation

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information