The MUMPS Solver: academic needs and industrial expectations

Transcription

1 The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, Bordeaux 1 CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

2 Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 2/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

4 Academic needs: a research platform Code Aster, Carter (e.g., finite elements) Solution of sparse systems Ax = b Often the most expensive part in numerical simulation codes Sparse direct methods to solve Ax = b: Decompose A under the form LU,LDL t or LL t Solve the triangular systems Ly = b, then Ux = y 3D example in earth science: acoustic wave propagation, 27-point finite difference grid Current goal [Seiscope project]: LU on complete earth n = N 3 = Extrapolation on a grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!

5 Sparse direct solution: main research issues Code Aster, EDF Pump, nuclear backup circuit Depth (km) Cross (km) m/s Dip (km) Frequency domain seismic modeling, Helmholtz equations, SEISCOPE project Extrapolation on a grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory! Main algorithmic issues Parallel algorithmic issues: synchronization avoidance, mapping irregular data structures, scheduling. Performance scalability: time but also memory/proc when increasing number of processors (and problem size). Numerical issues: numerical accurary, hybrid iterative-direct solvers, application (elliptic PDEs) specific solvers 5/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

6 Robust memory-aware mappings Context Memory per node or core is decreasing Factors Active Memory Disk NODE Factors Active Memory Disk NODE Factors Active Memory Disk Factors Active Memory Disk NODE NODE Active memory not naturally scalable, difficult to estimate Algorithmic work Design mapping algorithms that enforce some memory constraints and provide better memory estimates. Active memory size dominates total memory in parallel, Example: share of active storage on the AUDI matrix 1 processor: 11% 256 processors: 59%

7 Robust memory-aware mappings (problem) Metric: active memory efficiency e(p) = S seq p S max (p) with S seq sequential memory; S max (p) maximum memory used on p procs We would like e(p) 1, i.e. S seq /p on each processor. Common mappings/schedulings poor memory efficiency: Standard proportional mapping: lim e(p) = 0 on regular problems. p With more sophisticated relaxed proportional mapping, typical efficiency e(p) is still between 0.10 and (Memory estimates are unreliable). 7/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

8 Robust Memory-Aware mappings (results) Reduce memory serialize some branches in the elimination tree Reliable estimation and better memory use with Memory-Aware with respect to default version (MUMPS ). Illustration with matrix PANCAKE 2 (3D electromagnetism, Cedrat (Flux) and Padova Univ.), 64 MPI processes MUMPS Memory-aware mappings Objective max MB/core n/a Time (seconds) Active workspace (avg MB/core) Active workspace (max MB/core)

9 Application specific solvers : BLR solver Block Low-Rank approximations to improve sparse multifrontal solvers Low-rank approximations (Elliptic PDE s) memory compression and flop reduction accuracy controlled by a numerical parameter ( can also be used as a preconditioner) Main features of Block Low Rank (BLR) format Algebraic solver; flat and simple format Compatibility with numerical pivoting Many representations: Recursive H, H 2 [Bebendof, Börm, Hackbush, Grasedyck,... ], HSS/SSS [Chandrasekaran, Dewilde, Gu, Li, Xia,... ], Flat block low-rank (BLR)...

10 Block Low Rank multifrontal solver Elimination tree B Singular value decomposition (SVD) of each block B B = X 1 S 1 Y 1 + X 2 S 2 Y 2 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

11 Block Low Rank multifrontal solver Elimination tree B rank k(ε): B = X 1 S 1 Y 1 +X 2 S 2 Y 2 E 2 = X 2 S 2 Y 2 2 = σ k+1 ε Block Low-Rank Solver (BLR), PhD INP-EDF, 2013, C. Weisbecker 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

12 Application to frequency-domain seismic modeling Dip (km) ) m Depth (km) ops ε fqcy (10 5 ) 2 Hz 4 Hz 8 Hz (10 4 ) 2 Hz 4 Hz 8 Hz 5 Dip (km) ss (k ss ss ro C 0 15 m 15 ro ) Dip (km) 10 5 Depth (km) Depth (km) 5 (k (k ss ro C (k C 20 Depth (km) 15 m ) Dip (km) 10 ro 5 C 0 m ) memory L CB 41.8 % 27.4 % 21.8 % 61.8 % 50.0 % 41.6 % 32.3% 24.4% 23.9% 32.9 % 20.0 % 15.2 % 53.4 % 42.2 % 28.9 % 23.9% 21.7% 19.4% % : percentage of standard (full-rank) sparse solver 11/24 Se minaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

14 Industrial expectations: a software platform Technological transfer From research prototyping during PhD thesis to robust and portable software. Examples: Memory Aware : PhDs E. Agullo (LIP-ENS, 2008) and F.-H. Rouet (INPT-IRIT, 2012); Block Low Rank: PhD C. Weibecker (INPT-IRIT with EDF support, 2013). Software issues and interaction with users Code development: develop and combine complex features Software engineering: analysis/experimentation/validation tools, maintenance (also essential for research developments!) Users: expect support, training and adaptation/developments but also: research collaborations, software validation and financial support.

15 MUMPS solver software platform General context Initially funded by European project ( ), 12 partners from 5 countries Publically available since 1999 at and Co-developed in Toulouse, Lyon-Grenoble, Bordeaux by CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux Latest release MUMPS , May 2011, lines of C and Fortran code Competitive and original software package used worldwide Integrated within commercial and open-source packages (e.g., Samcef from Samtech, Actran from Free Field Technologies, Code Aster from EDF, PAM-Crash from ESI, IPOPT, Petsc, Trilinos, Debian packages,... ).

16 Software requests World Map since Dec (8839 requests) 15/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

17 Software requests The number of requests per day has increased steadily throughout the evolution of the software Requests per day MUMPS releases The latest version (4.10.0) is downloaded more than 1000 times per year 16/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

18 MUMPS Team (May 2014) Permanent members: Patrick Amestoy (INPT-IRIT, Toulouse) Jean-Yves L Excellent (INRIA-LIP, Lyon) Abdou Guermouche (LABRI, Bordeaux) Bora Uçar (CNRS-LIP, Lyon) Alfredo Buttari (CNRS-IRIT, Toulouse) Engineers: Guillaume Joslin (Université Paul Sabatier, Toulouse) Chiara Puglisi (INRIA, Grenoble) Part time on MUMPS: Maurice Brémond (INRIA, Grenoble) PhD Students: Mohamed Sid-Lakhdar (ENS-Lyon) Florent Lopez (UPS, Toulouse) 17/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

19 : Research through PhD s Ph.D. students connected to the project: S.Pralet, CERFACS A. Guermouche,ENS Lyon C. Voemel, CERFACS M. Slavova, CERFACS E. Agullo, ENS Lyon F. Lopez, UPS W. Sid-Lakhdar, ENS Lyon C. Weisbecker, INPT-EDF F.-H. Rouet, INPT Some research themes: Preprocessing and orderings, Numerical pivoting and accuracy, Numerical features, Memory usage and task scheduling, Shared-memory parallelism 18/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

20 Relations with our users Exchanges with users Direct contacts by MUMPS Users Mailing list MUMPS Users Days 1 October 24th, 2006, Lyon, France 2 April 15th - 16th, 2010, Toulouse, France 3 May 29th - 30th, 2013, EDF, Clamart, France Objectives of these workshops: Present some facets of the algorithmic, numerical and software work in the context of the MUMPS project/solver Share experience Identify users expectations (software evolution, new features) Discuss future research tracks and future of MUMPS

22 Research perspectives Scientific hurdles and related research areas Computation driven by memory: Memory-aware algorithms Controlled accuracy to improve complexity: BLR Solver Multicore and asynchronous communications: key issue for time and memory scalability, algorithms and communication schemes need be revisited. Performance projection and target (3D Helmholtz; n = 10 9 ; 1.4 PFlops computer, 2000 nodes, 32 core/node) (Still much research and software work needed to reach this target!!) MUMPS Research target Time 10 7 seconds 10 4 seconds Factors 8 GB/core 3 GB/core Workspace 50 GB/core 2 GB/core 21/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

23 Software agreement Software agreement signed by owners of the software: CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux 1. Key features All institutions have recognized and confirmed their will to freely distribute MUMPS releases A technical committee supervises technical/scientific decisions Conditions of use for development version defined Conditions of transfer toward next public version defined License for public versions: Cecill-C (LGPL-compatible) 22/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014

24 Sustainability of MUMPS software and research platform Objectives Stabilize engineering work and expertise with long-term positions Ensure software quality and faster transfer research work MUMPS Consortium Type: group of users Objective: support engineer work Services: beta-release of future/new functionalities, annual meeting to share experience, wish list to influence priority in development, training cycles... On going work... takes more time than one could have expected

25 References I 24/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014