The MUMPS Solver: academic needs and industrial expectations

Size: px
Start display at page:

Download "The MUMPS Solver: academic needs and industrial expectations"

Transcription

1 The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, Bordeaux 1 CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

2 Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 2/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

3 Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 3/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

4 Academic needs: a research platform Code Aster, Carter (e.g., finite elements) Solution of sparse systems Ax = b Often the most expensive part in numerical simulation codes Sparse direct methods to solve Ax = b: Decompose A under the form LU,LDL t or LL t Solve the triangular systems Ly = b, then Ux = y 3D example in earth science: acoustic wave propagation, 27-point finite difference grid Current goal [Seiscope project]: LU on complete earth n = N 3 = Extrapolation on a grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!

5 Sparse direct solution: main research issues Code Aster, EDF Pump, nuclear backup circuit Depth (km) Cross (km) m/s Dip (km) Frequency domain seismic modeling, Helmholtz equations, SEISCOPE project Extrapolation on a grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory! Main algorithmic issues Parallel algorithmic issues: synchronization avoidance, mapping irregular data structures, scheduling. Performance scalability: time but also memory/proc when increasing number of processors (and problem size). Numerical issues: numerical accurary, hybrid iterative-direct solvers, application (elliptic PDEs) specific solvers 5/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

6 Robust memory-aware mappings Context Memory per node or core is decreasing Factors Active Memory Disk NODE Factors Active Memory Disk NODE Factors Active Memory Disk Factors Active Memory Disk NODE NODE Active memory not naturally scalable, difficult to estimate Algorithmic work Design mapping algorithms that enforce some memory constraints and provide better memory estimates. Active memory size dominates total memory in parallel, Example: share of active storage on the AUDI matrix 1 processor: 11% 256 processors: 59%

7 Robust memory-aware mappings (problem) Metric: active memory efficiency e(p) = S seq p S max (p) with S seq sequential memory; S max (p) maximum memory used on p procs We would like e(p) 1, i.e. S seq /p on each processor. Common mappings/schedulings poor memory efficiency: Standard proportional mapping: lim e(p) = 0 on regular problems. p With more sophisticated relaxed proportional mapping, typical efficiency e(p) is still between 0.10 and (Memory estimates are unreliable). 7/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

8 Robust Memory-Aware mappings (results) Reduce memory serialize some branches in the elimination tree Reliable estimation and better memory use with Memory-Aware with respect to default version (MUMPS ). Illustration with matrix PANCAKE 2 (3D electromagnetism, Cedrat (Flux) and Padova Univ.), 64 MPI processes MUMPS Memory-aware mappings Objective max MB/core n/a Time (seconds) Active workspace (avg MB/core) Active workspace (max MB/core)

9 Application specific solvers : BLR solver Block Low-Rank approximations to improve sparse multifrontal solvers Low-rank approximations (Elliptic PDE s) memory compression and flop reduction accuracy controlled by a numerical parameter ( can also be used as a preconditioner) Main features of Block Low Rank (BLR) format Algebraic solver; flat and simple format Compatibility with numerical pivoting Many representations: Recursive H, H 2 [Bebendof, Börm, Hackbush, Grasedyck,... ], HSS/SSS [Chandrasekaran, Dewilde, Gu, Li, Xia,... ], Flat block low-rank (BLR)...

10 Block Low Rank multifrontal solver Elimination tree B Singular value decomposition (SVD) of each block B B = X 1 S 1 Y 1 + X 2 S 2 Y 2 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

11 Block Low Rank multifrontal solver Elimination tree B rank k(ε): B = X 1 S 1 Y 1 +X 2 S 2 Y 2 E 2 = X 2 S 2 Y 2 2 = σ k+1 ε Block Low-Rank Solver (BLR), PhD INP-EDF, 2013, C. Weisbecker 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

12 Application to frequency-domain seismic modeling Dip (km) ) m Depth (km) ops ε fqcy (10 5 ) 2 Hz 4 Hz 8 Hz (10 4 ) 2 Hz 4 Hz 8 Hz 5 Dip (km) ss (k ss ss ro C 0 15 m 15 ro ) Dip (km) 10 5 Depth (km) Depth (km) 5 (k (k ss ro C (k C 20 Depth (km) 15 m ) Dip (km) 10 ro 5 C 0 m ) memory L CB 41.8 % 27.4 % 21.8 % 61.8 % 50.0 % 41.6 % 32.3% 24.4% 23.9% 32.9 % 20.0 % 15.2 % 53.4 % 42.2 % 28.9 % 23.9% 21.7% 19.4% % : percentage of standard (full-rank) sparse solver 11/24 Se minaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

13 Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 12/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

14 Industrial expectations: a software platform Technological transfer From research prototyping during PhD thesis to robust and portable software. Examples: Memory Aware : PhDs E. Agullo (LIP-ENS, 2008) and F.-H. Rouet (INPT-IRIT, 2012); Block Low Rank: PhD C. Weibecker (INPT-IRIT with EDF support, 2013). Software issues and interaction with users Code development: develop and combine complex features Software engineering: analysis/experimentation/validation tools, maintenance (also essential for research developments!) Users: expect support, training and adaptation/developments but also: research collaborations, software validation and financial support.

15 MUMPS solver software platform General context Initially funded by European project ( ), 12 partners from 5 countries Publically available since 1999 at and Co-developed in Toulouse, Lyon-Grenoble, Bordeaux by CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux Latest release MUMPS , May 2011, lines of C and Fortran code Competitive and original software package used worldwide Integrated within commercial and open-source packages (e.g., Samcef from Samtech, Actran from Free Field Technologies, Code Aster from EDF, PAM-Crash from ESI, IPOPT, Petsc, Trilinos, Debian packages,... ).

16 Software requests World Map since Dec (8839 requests) 15/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

17 Software requests The number of requests per day has increased steadily throughout the evolution of the software Requests per day MUMPS releases The latest version (4.10.0) is downloaded more than 1000 times per year 16/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

18 MUMPS Team (May 2014) Permanent members: Patrick Amestoy (INPT-IRIT, Toulouse) Jean-Yves L Excellent (INRIA-LIP, Lyon) Abdou Guermouche (LABRI, Bordeaux) Bora Uçar (CNRS-LIP, Lyon) Alfredo Buttari (CNRS-IRIT, Toulouse) Engineers: Guillaume Joslin (Université Paul Sabatier, Toulouse) Chiara Puglisi (INRIA, Grenoble) Part time on MUMPS: Maurice Brémond (INRIA, Grenoble) PhD Students: Mohamed Sid-Lakhdar (ENS-Lyon) Florent Lopez (UPS, Toulouse) 17/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

19 : Research through PhD s Ph.D. students connected to the project: S.Pralet, CERFACS A. Guermouche,ENS Lyon C. Voemel, CERFACS M. Slavova, CERFACS E. Agullo, ENS Lyon F. Lopez, UPS W. Sid-Lakhdar, ENS Lyon C. Weisbecker, INPT-EDF F.-H. Rouet, INPT Some research themes: Preprocessing and orderings, Numerical pivoting and accuracy, Numerical features, Memory usage and task scheduling, Shared-memory parallelism 18/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

20 Relations with our users Exchanges with users Direct contacts by MUMPS Users Mailing list MUMPS Users Days 1 October 24th, 2006, Lyon, France 2 April 15th - 16th, 2010, Toulouse, France 3 May 29th - 30th, 2013, EDF, Clamart, France Objectives of these workshops: Present some facets of the algorithmic, numerical and software work in the context of the MUMPS project/solver Share experience Identify users expectations (software evolution, new features) Discuss future research tracks and future of MUMPS

21 Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 20/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

22 Research perspectives Scientific hurdles and related research areas Computation driven by memory: Memory-aware algorithms Controlled accuracy to improve complexity: BLR Solver Multicore and asynchronous communications: key issue for time and memory scalability, algorithms and communication schemes need be revisited. Performance projection and target (3D Helmholtz; n = 10 9 ; 1.4 PFlops computer, 2000 nodes, 32 core/node) (Still much research and software work needed to reach this target!!) MUMPS Research target Time 10 7 seconds 10 4 seconds Factors 8 GB/core 3 GB/core Workspace 50 GB/core 2 GB/core 21/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

23 Software agreement Software agreement signed by owners of the software: CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux 1. Key features All institutions have recognized and confirmed their will to freely distribute MUMPS releases A technical committee supervises technical/scientific decisions Conditions of use for development version defined Conditions of transfer toward next public version defined License for public versions: Cecill-C (LGPL-compatible) 22/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

24 Sustainability of MUMPS software and research platform Objectives Stabilize engineering work and expertise with long-term positions Ensure software quality and faster transfer research work MUMPS Consortium Type: group of users Objective: support engineer work Services: beta-release of future/new functionalities, annual meeting to share experience, wish list to influence priority in development, training cycles... On going work... takes more time than one could have expected

25 References I 24/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

GOAL AND STATUS OF THE TLSE PLATFORM

GOAL AND STATUS OF THE TLSE PLATFORM GOAL AND STATUS OF THE TLSE PLATFORM P. Amestoy, F. Camillo, M. Daydé,, L. Giraud, R. Guivarch, V. Moya Lamiel,, M. Pantel, and C. Puglisi IRIT-ENSEEIHT And J.-Y. L excellentl LIP-ENS Lyon / INRIA http://www.irit.enseeiht.fr

More information

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics

Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics Mario Echeverri, PhD. Student (2 nd year, presently doing a research period abroad) ID:30360 Tutor: Prof. Francesca Vipiana,

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

HSL and its out-of-core solver

HSL and its out-of-core solver HSL and its out-of-core solver Jennifer A. Scott [email protected] Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many

More information

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Poisson Equation Solver Parallelisation for Particle-in-Cell Model WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Basin simulation for complex geological settings

Basin simulation for complex geological settings Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

Software Engineering Principles The TriBITS Lifecycle Model. Mike Heroux Ross Bartlett (ORNL) Jim Willenbring (SNL)

Software Engineering Principles The TriBITS Lifecycle Model. Mike Heroux Ross Bartlett (ORNL) Jim Willenbring (SNL) Software Engineering Principles The TriBITS Lifecycle Model Mike Heroux Ross Bartlett (ORNL) Jim Willenbring (SNL) TriBITS Lifecycle Model 1.0 Document Motivation for the TriBITS Lifecycle Model Overview

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Deploying Clusters at Electricité de France. Jean-Yves Berthou

Deploying Clusters at Electricité de France. Jean-Yves Berthou Electricit é Deploying Clusters at Workshop Operating Systems, Tools and Methods for High Performance Computing on Linux Clusters Jean-Yves Berthou Head of the Applied Scientific Computing Group EDF R&D

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

Mathematical Libraries on JUQUEEN. JSC Training Course

Mathematical Libraries on JUQUEEN. JSC Training Course Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

High-fidelity electromagnetic modeling of large multi-scale naval structures

High-fidelity electromagnetic modeling of large multi-scale naval structures High-fidelity electromagnetic modeling of large multi-scale naval structures F. Vipiana, M. A. Francavilla, S. Arianos, and G. Vecchi (LACE), and Politecnico di Torino 1 Outline ISMB and Antenna/EMC Lab

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix 7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES Irina Kalashnikova, Andy G. Salinger, Ray S. Tuminaro Numerical Analysis

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Load Imbalance Analysis

Load Imbalance Analysis With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization

More information

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Mathematical Libraries and Application Software on JUROPA and JUQUEEN Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

A new binary floating-point division algorithm and its software implementation on the ST231 processor

A new binary floating-point division algorithm and its software implementation on the ST231 processor 19th IEEE Symposium on Computer Arithmetic (ARITH 19) Portland, Oregon, USA, June 8-10, 2009 A new binary floating-point division algorithm and its software implementation on the ST231 processor Claude-Pierre

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Power-Aware High-Performance Scientific Computing

Power-Aware High-Performance Scientific Computing Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Theses of the Ph.D. dissertation Zoltán Nagy Scientific adviser: Dr. Péter Szolgay Doctoral School

More information

Source Code Transformations Strategies to Load-balance Grid Applications

Source Code Transformations Strategies to Load-balance Grid Applications Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

W009 Application of VTI Waveform Inversion with Regularization and Preconditioning to Real 3D Data

W009 Application of VTI Waveform Inversion with Regularization and Preconditioning to Real 3D Data W009 Application of VTI Waveform Inversion with Regularization and Preconditioning to Real 3D Data C. Wang* (ION Geophysical), D. Yingst (ION Geophysical), R. Bloor (ION Geophysical) & J. Leveille (ION

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

The Application of a Black-Box Solver with Error Estimate to Different Systems of PDEs

The Application of a Black-Box Solver with Error Estimate to Different Systems of PDEs The Application of a Black-Bo Solver with Error Estimate to Different Systems of PDEs Torsten Adolph and Willi Schönauer Forschungszentrum Karlsruhe Institute for Scientific Computing Karlsruhe, Germany

More information

Simulation of Fluid-Structure Interactions in Aeronautical Applications

Simulation of Fluid-Structure Interactions in Aeronautical Applications Simulation of Fluid-Structure Interactions in Aeronautical Applications Martin Kuntz Jorge Carregal Ferreira ANSYS Germany D-83624 Otterfing [email protected] December 2003 3 rd FENET Annual Industry

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

S-series SQ Controller

S-series SQ Controller DeltaV Distributed Control System Product Data Sheet December 2015 S-series SQ Controller Scalable controllers Quick assembly Easy-to-use Field proven architecture Designed for Electonic Marshalling Advanced

More information

Stochastic control for underwater optimal trajectories CQFD & DCNS. Inria Bordeaux Sud Ouest & University of Bordeaux France

Stochastic control for underwater optimal trajectories CQFD & DCNS. Inria Bordeaux Sud Ouest & University of Bordeaux France Stochastic control for underwater optimal trajectories CQFD & DCNS Inria Bordeaux Sud Ouest & University of Bordeaux France CQFD Inria Bordeaux Sud Ouest Departamento de Engenharia Elétrica, USP São Carlos,

More information

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent

More information

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation Mark Baker, Garry Smith and Ahmad Hasaan SSE, University of Reading Paravirtualization A full assessment of paravirtualization

More information

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers The Boeing Company P.O.Box3707,MC7L-21 Seattle, WA 98124-2207 Final Technical Report February 1999 Document D6-82405 Copyright 1999 The Boeing Company All Rights Reserved Algorithmic Research and Software

More information

Software Development around a Millisecond

Software Development around a Millisecond Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

More information

Best practices for efficient HPC performance with large models

Best practices for efficient HPC performance with large models Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,

More information

1 Finite difference example: 1D implicit heat equation

1 Finite difference example: 1D implicit heat equation 1 Finite difference example: 1D implicit heat equation 1.1 Boundary conditions Neumann and Dirichlet We solve the transient heat equation ρc p t = ( k ) (1) on the domain L/2 x L/2 subject to the following

More information

The Asynchronous Dynamic Load-Balancing Library

The Asynchronous Dynamic Load-Balancing Library The Asynchronous Dynamic Load-Balancing Library Rusty Lusk, Steve Pieper, Ralph Butler, Anthony Chan Mathematics and Computer Science Division Nuclear Physics Division Outline The Nuclear Physics problem

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

OpenFOAM Workshop. Yağmur Gülkanat Res.Assist.

OpenFOAM Workshop. Yağmur Gülkanat Res.Assist. OpenFOAM Workshop Yağmur Gülkanat Res.Assist. Introduction to OpenFOAM What is OpenFOAM? FOAM = Field Operation And Manipulation OpenFOAM is a free-to-use open-source numerical simulation software with

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

Real Time Simulation of Power Plants

Real Time Simulation of Power Plants Real Time Simulation of Power Plants Torsten Dreher 1 System Simulation Group Friedrich-Alexander-University Erlangen-Nuremberg Siemens Simulation Center, Erlangen December 14, 2008 1 [email protected]

More information

CFD analysis for road vehicles - case study

CFD analysis for road vehicles - case study CFD analysis for road vehicles - case study Dan BARBUT*,1, Eugen Mihai NEGRUS 1 *Corresponding author *,1 POLITEHNICA University of Bucharest, Faculty of Transport, Splaiul Independentei 313, 060042, Bucharest,

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

Using Peer to Peer Dynamic Querying in Grid Information Services

Using Peer to Peer Dynamic Querying in Grid Information Services Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy Using P2P for Large scale Grid Information

More information

Load Balancing Techniques

Load Balancing Techniques Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques

More information

Arcane/ArcGeoSim, a software framework for geosciences simulation

Arcane/ArcGeoSim, a software framework for geosciences simulation Renewable energies Eco-friendly production Innovative transport Eco-efficient processes Sustainable resources Arcane/ArcGeoSim, a software framework for geosciences simulation Pascal Havé Outline these

More information

Efficient numerical simulation of time-harmonic wave equations

Efficient numerical simulation of time-harmonic wave equations Efficient numerical simulation of time-harmonic wave equations Prof. Tuomo Rossi Dr. Dirk Pauly Ph.Lic. Sami Kähkönen Ph.Lic. Sanna Mönkölä M.Sc. Tuomas Airaksinen M.Sc. Anssi Pennanen M.Sc. Jukka Räbinä

More information

Curriculum Vitae of Paola Boito

Curriculum Vitae of Paola Boito Curriculum Vitae of Paola Boito Personal information Born on 1 st August 1978 in Asolo (TV), Italy. Italian citizenship. E-mail: [email protected] [email protected] Homepage: http://www.mathcs.emory.edu/~boito

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid 5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum

More information

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution Y. Kessaci, N. Melab et E-G. Talbi Dolphin Project Team, Université Lille 1, LIFL-CNRS,

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: SVD revisited; Software for Linear Algebra Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 9 Outline 1 Computing

More information

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013 2013 Code_Saturne User Group Meeting EDF R&D Chatou, France 9 th April 2013 Thermal Comfort in Train Passenger Cars Contact For further information please contact: Brian ANGEL Director RENUDA France [email protected]

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Converted-waves imaging condition for elastic reverse-time migration Yuting Duan and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

Converted-waves imaging condition for elastic reverse-time migration Yuting Duan and Paul Sava, Center for Wave Phenomena, Colorado School of Mines Converted-waves imaging condition for elastic reverse-time migration Yuting Duan and Paul Sava, Center for Wave Phenomena, Colorado School of Mines SUMMARY Polarity changes in converted-wave images constructed

More information