The MUMPS Solver: academic needs and industrial expectations



Similar documents
GOAL AND STATUS OF THE TLSE PLATFORM

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Fast Iterative Solvers for Integral Equation Based Techniques in Electromagnetics

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

HSL and its out-of-core solver

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

Fast Multipole Method for particle interactions: an open source parallel library component

Basin simulation for complex geological settings

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

HPC enabling of OpenFOAM R for CFD applications

Software Engineering Principles The TriBITS Lifecycle Model. Mike Heroux Ross Bartlett (ORNL) Jim Willenbring (SNL)

Turbomachinery CFD on many-core platforms experiences and strategies

Deploying Clusters at Electricité de France. Jean-Yves Berthou

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Distributed communication-aware load balancing with TreeMatch in Charm++

Mathematical Libraries on JUQUEEN. JSC Training Course

Multicore Parallel Computing with OpenMP

Large-Scale Reservoir Simulation and Big Data Visualization

High-fidelity electromagnetic modeling of large multi-scale naval structures

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Load Imbalance Analysis

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

A new binary floating-point division algorithm and its software implementation on the ST231 processor

Solution of Linear Systems

Power-Aware High-Performance Scientific Computing

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Performance Monitoring of Parallel Scientific Applications

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications

Source Code Transformations Strategies to Load-balance Grid Applications

High Performance Computing in CST STUDIO SUITE

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

HPC Deployment of OpenFOAM in an Industrial Setting

General Framework for an Iterative Solution of Ax b. Jacobi s Method

W009 Application of VTI Waveform Inversion with Regularization and Preconditioning to Real 3D Data

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

The Application of a Black-Box Solver with Error Estimate to Different Systems of PDEs

Simulation of Fluid-Structure Interactions in Aeronautical Applications

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

S-series SQ Controller

Stochastic control for underwater optimal trajectories CQFD & DCNS. Inria Bordeaux Sud Ouest & University of Bordeaux France

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers

Software Development around a Millisecond

Best practices for efficient HPC performance with large models

1 Finite difference example: 1D implicit heat equation

The Asynchronous Dynamic Load-Balancing Library

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

MEng, BSc Applied Computer Science

OpenFOAM Workshop. Yağmur Gülkanat Res.Assist.

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Iterative Solvers for Linear Systems

Real Time Simulation of Power Plants

CFD analysis for road vehicles - case study

High Performance Matrix Inversion with Several GPUs

Using Peer to Peer Dynamic Querying in Grid Information Services

Load Balancing Techniques

Arcane/ArcGeoSim, a software framework for geosciences simulation

Efficient numerical simulation of time-harmonic wave equations

Curriculum Vitae of Paola Boito

MEng, BSc Computer Science with Artificial Intelligence

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Systolic Computing. Fundamentals

Scientific Computing Programming with Parallel Objects

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

2013 Code_Saturne User Group Meeting. EDF R&D Chatou, France. 9 th April 2013

Architectures for Big Data Analytics A database perspective

Converted-waves imaging condition for elastic reverse-time migration Yuting Duan and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

Transcription:

The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, Bordeaux 1 CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 2/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 3/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Academic needs: a research platform Code Aster, Carter (e.g., finite elements) Solution of sparse systems Ax = b Often the most expensive part in numerical simulation codes Sparse direct methods to solve Ax = b: Decompose A under the form LU,LDL t or LL t Solve the triangular systems Ly = b, then Ux = y 3D example in earth science: acoustic wave propagation, 27-point finite difference grid Current goal [Seiscope project]: LU on complete earth n = N 3 = 1000 3 Extrapolation on a 1000 1000 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!

Sparse direct solution: main research issues Code Aster, EDF Pump, nuclear backup circuit Depth (km) 0 1 2 3 4 5 10 Cross (km) 15 0 20 3000 4000 5000 6000 m/s Dip (km) 5 10 15 20 Frequency domain seismic modeling, Helmholtz equations, SEISCOPE project Extrapolation on a 1000 1000 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory! Main algorithmic issues Parallel algorithmic issues: synchronization avoidance, mapping irregular data structures, scheduling. Performance scalability: time but also memory/proc when increasing number of processors (and problem size). Numerical issues: numerical accurary, hybrid iterative-direct solvers, application (elliptic PDEs) specific solvers 5/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Robust memory-aware mappings Context Memory per node or core is decreasing Factors Active Memory Disk NODE Factors Active Memory Disk NODE...... Factors Active Memory Disk Factors Active Memory Disk NODE NODE Active memory not naturally scalable, difficult to estimate Algorithmic work Design mapping algorithms that enforce some memory constraints and provide better memory estimates. Active memory size dominates total memory in parallel, Example: share of active storage on the AUDI matrix 1 processor: 11% 256 processors: 59%

Robust memory-aware mappings (problem) Metric: active memory efficiency e(p) = S seq p S max (p) with S seq sequential memory; S max (p) maximum memory used on p procs We would like e(p) 1, i.e. S seq /p on each processor. Common mappings/schedulings poor memory efficiency: Standard proportional mapping: lim e(p) = 0 on regular problems. p With more sophisticated relaxed proportional mapping, typical efficiency e(p) is still between 0.10 and 0.40. (Memory estimates are unreliable). 7/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Robust Memory-Aware mappings (results) Reduce memory serialize some branches in the elimination tree Reliable estimation and better memory use with Memory-Aware with respect to default version (MUMPS 4.10.0). Illustration with matrix PANCAKE 2 (3D electromagnetism, Cedrat (Flux) and Padova Univ.), 64 MPI processes MUMPS Memory-aware 4.10.0 mappings Objective max MB/core n/a 400 200 Time (seconds) 418 591 684 Active workspace (avg MB/core) 539.4 234.7 180.0 Active workspace (max MB/core) 900.3 356.2 181.5

Application specific solvers : BLR solver Block Low-Rank approximations to improve sparse multifrontal solvers Low-rank approximations (Elliptic PDE s) memory compression and flop reduction accuracy controlled by a numerical parameter ( can also be used as a preconditioner) Main features of Block Low Rank (BLR) format Algebraic solver; flat and simple format Compatibility with numerical pivoting Many representations: Recursive H, H 2 [Bebendof, Börm, Hackbush, Grasedyck,... ], HSS/SSS [Chandrasekaran, Dewilde, Gu, Li, Xia,... ], Flat block low-rank (BLR)...

Block Low Rank multifrontal solver Elimination tree B Singular value decomposition (SVD) of each block B B = X 1 S 1 Y 1 + X 2 S 2 Y 2 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Block Low Rank multifrontal solver Elimination tree B rank k(ε): B = X 1 S 1 Y 1 +X 2 S 2 Y 2 E 2 = X 2 S 2 Y 2 2 = σ k+1 ε Block Low-Rank Solver (BLR), PhD INP-EDF, 2013, C. Weisbecker 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Application to frequency-domain seismic modeling 20 20 5 Dip (km) 10 15 20 20 ) m 10 10 Depth (km) 0 1 2 3 4 ops ε fqcy (10 5 ) 2 Hz 4 Hz 8 Hz (10 4 ) 2 Hz 4 Hz 8 Hz 5 Dip (km) 10 15 20 10 5 5 0 1 2 3 4 0 15 ss (k ss ss ro C 0 15 m 15 ro ) Dip (km) 10 5 Depth (km) Depth (km) 5 (k (k ss ro C 10 5 0 1 2 3 4 0 (k 20 15 C 20 Depth (km) 15 m ) Dip (km) 10 ro 5 C 0 m ) 20 15 0 1 2 3 4 memory L CB 41.8 % 27.4 % 21.8 % 61.8 % 50.0 % 41.6 % 32.3% 24.4% 23.9% 32.9 % 20.0 % 15.2 % 53.4 % 42.2 % 28.9 % 23.9% 21.7% 19.4% % : percentage of standard (full-rank) sparse solver 11/24 Se minaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 12/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Industrial expectations: a software platform Technological transfer From research prototyping during PhD thesis to robust and portable software. Examples: Memory Aware : PhDs E. Agullo (LIP-ENS, 2008) and F.-H. Rouet (INPT-IRIT, 2012); Block Low Rank: PhD C. Weibecker (INPT-IRIT with EDF support, 2013). Software issues and interaction with users Code development: develop and combine complex features Software engineering: analysis/experimentation/validation tools, maintenance (also essential for research developments!) Users: expect support, training and adaptation/developments but also: research collaborations, software validation and financial support.

MUMPS solver software platform General context Initially funded by European project (1996-1999), 12 partners from 5 countries Publically available since 1999 at http://graal.ens-lyon.fr/mumps and http://mumps.enseeiht.fr Co-developed in Toulouse, Lyon-Grenoble, Bordeaux by CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux Latest release MUMPS 4.10.0, May 2011, 250 000 lines of C and Fortran code Competitive and original software package used worldwide Integrated within commercial and open-source packages (e.g., Samcef from Samtech, Actran from Free Field Technologies, Code Aster from EDF, PAM-Crash from ESI, IPOPT, Petsc, Trilinos, Debian packages,... ).

Software requests World Map since Dec. 2002 (8839 requests) 15/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Software requests The number of requests per day has increased steadily throughout the evolution of the software Requests per day 4.5 4 3.5 3.52 4.02 3 2.84 2.5 2 1.51.3 1.31 1 1.58 2.04 0.5 0 4.3 4.5 4.6 4.7 4.8 4.9 4.10 MUMPS releases The latest version (4.10.0) is downloaded more than 1000 times per year 16/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

MUMPS Team (May 2014) Permanent members: Patrick Amestoy (INPT-IRIT, Toulouse) Jean-Yves L Excellent (INRIA-LIP, Lyon) Abdou Guermouche (LABRI, Bordeaux) Bora Uçar (CNRS-LIP, Lyon) Alfredo Buttari (CNRS-IRIT, Toulouse) Engineers: Guillaume Joslin (Université Paul Sabatier, Toulouse) Chiara Puglisi (INRIA, Grenoble) Part time on MUMPS: Maurice Brémond (INRIA, Grenoble) PhD Students: Mohamed Sid-Lakhdar (ENS-Lyon) Florent Lopez (UPS, Toulouse) 17/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

2000-2013: Research through PhD s Ph.D. students connected to the project: S.Pralet, CERFACS A. Guermouche,ENS Lyon C. Voemel, CERFACS M. Slavova, CERFACS E. Agullo, ENS Lyon F. Lopez, UPS W. Sid-Lakhdar, ENS Lyon C. Weisbecker, INPT-EDF F.-H. Rouet, INPT 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Some research themes: Preprocessing and orderings, Numerical pivoting and accuracy, Numerical features, Memory usage and task scheduling, Shared-memory parallelism 18/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Relations with our users Exchanges with users Direct contacts by email MUMPS Users Mailing list MUMPS Users Days 1 October 24th, 2006, Lyon, France 2 April 15th - 16th, 2010, Toulouse, France 3 May 29th - 30th, 2013, EDF, Clamart, France Objectives of these workshops: Present some facets of the algorithmic, numerical and software work in the context of the MUMPS project/solver Share experience Identify users expectations (software evolution, new features) Discuss future research tracks and future of MUMPS

Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 20/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Research perspectives Scientific hurdles and related research areas Computation driven by memory: Memory-aware algorithms Controlled accuracy to improve complexity: BLR Solver Multicore and asynchronous communications: key issue for time and memory scalability, algorithms and communication schemes need be revisited. Performance projection and target (3D Helmholtz; n = 10 9 ; 1.4 PFlops computer, 2000 nodes, 32 core/node) (Still much research and software work needed to reach this target!!) MUMPS 4.10.0 Research target Time 10 7 seconds 10 4 seconds Factors 8 GB/core 3 GB/core Workspace 50 GB/core 2 GB/core 21/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Software agreement Software agreement signed by owners of the software: CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux 1. Key features All institutions have recognized and confirmed their will to freely distribute MUMPS releases A technical committee supervises technical/scientific decisions Conditions of use for development version defined Conditions of transfer toward next public version defined License for public versions: Cecill-C (LGPL-compatible) 22/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

Sustainability of MUMPS software and research platform Objectives Stabilize engineering work and expertise with long-term positions Ensure software quality and faster transfer research work MUMPS Consortium Type: group of users Objective: support engineer work Services: beta-release of future/new functionalities, annual meeting to share experience, wish list to influence priority in development, training cycles... On going work... takes more time than one could have expected

References I 24/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014