The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, Bordeaux 1 CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 2/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 3/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Academic needs: a research platform Code Aster, Carter (e.g., finite elements) Solution of sparse systems Ax = b Often the most expensive part in numerical simulation codes Sparse direct methods to solve Ax = b: Decompose A under the form LU,LDL t or LL t Solve the triangular systems Ly = b, then Ux = y 3D example in earth science: acoustic wave propagation, 27-point finite difference grid Current goal [Seiscope project]: LU on complete earth n = N 3 = 1000 3 Extrapolation on a 1000 1000 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!
Sparse direct solution: main research issues Code Aster, EDF Pump, nuclear backup circuit Depth (km) 0 1 2 3 4 5 10 Cross (km) 15 0 20 3000 4000 5000 6000 m/s Dip (km) 5 10 15 20 Frequency domain seismic modeling, Helmholtz equations, SEISCOPE project Extrapolation on a 1000 1000 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory! Main algorithmic issues Parallel algorithmic issues: synchronization avoidance, mapping irregular data structures, scheduling. Performance scalability: time but also memory/proc when increasing number of processors (and problem size). Numerical issues: numerical accurary, hybrid iterative-direct solvers, application (elliptic PDEs) specific solvers 5/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Robust memory-aware mappings Context Memory per node or core is decreasing Factors Active Memory Disk NODE Factors Active Memory Disk NODE...... Factors Active Memory Disk Factors Active Memory Disk NODE NODE Active memory not naturally scalable, difficult to estimate Algorithmic work Design mapping algorithms that enforce some memory constraints and provide better memory estimates. Active memory size dominates total memory in parallel, Example: share of active storage on the AUDI matrix 1 processor: 11% 256 processors: 59%
Robust memory-aware mappings (problem) Metric: active memory efficiency e(p) = S seq p S max (p) with S seq sequential memory; S max (p) maximum memory used on p procs We would like e(p) 1, i.e. S seq /p on each processor. Common mappings/schedulings poor memory efficiency: Standard proportional mapping: lim e(p) = 0 on regular problems. p With more sophisticated relaxed proportional mapping, typical efficiency e(p) is still between 0.10 and 0.40. (Memory estimates are unreliable). 7/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Robust Memory-Aware mappings (results) Reduce memory serialize some branches in the elimination tree Reliable estimation and better memory use with Memory-Aware with respect to default version (MUMPS 4.10.0). Illustration with matrix PANCAKE 2 (3D electromagnetism, Cedrat (Flux) and Padova Univ.), 64 MPI processes MUMPS Memory-aware 4.10.0 mappings Objective max MB/core n/a 400 200 Time (seconds) 418 591 684 Active workspace (avg MB/core) 539.4 234.7 180.0 Active workspace (max MB/core) 900.3 356.2 181.5
Application specific solvers : BLR solver Block Low-Rank approximations to improve sparse multifrontal solvers Low-rank approximations (Elliptic PDE s) memory compression and flop reduction accuracy controlled by a numerical parameter ( can also be used as a preconditioner) Main features of Block Low Rank (BLR) format Algebraic solver; flat and simple format Compatibility with numerical pivoting Many representations: Recursive H, H 2 [Bebendof, Börm, Hackbush, Grasedyck,... ], HSS/SSS [Chandrasekaran, Dewilde, Gu, Li, Xia,... ], Flat block low-rank (BLR)...
Block Low Rank multifrontal solver Elimination tree B Singular value decomposition (SVD) of each block B B = X 1 S 1 Y 1 + X 2 S 2 Y 2 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Block Low Rank multifrontal solver Elimination tree B rank k(ε): B = X 1 S 1 Y 1 +X 2 S 2 Y 2 E 2 = X 2 S 2 Y 2 2 = σ k+1 ε Block Low-Rank Solver (BLR), PhD INP-EDF, 2013, C. Weisbecker 10/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Application to frequency-domain seismic modeling 20 20 5 Dip (km) 10 15 20 20 ) m 10 10 Depth (km) 0 1 2 3 4 ops ε fqcy (10 5 ) 2 Hz 4 Hz 8 Hz (10 4 ) 2 Hz 4 Hz 8 Hz 5 Dip (km) 10 15 20 10 5 5 0 1 2 3 4 0 15 ss (k ss ss ro C 0 15 m 15 ro ) Dip (km) 10 5 Depth (km) Depth (km) 5 (k (k ss ro C 10 5 0 1 2 3 4 0 (k 20 15 C 20 Depth (km) 15 m ) Dip (km) 10 ro 5 C 0 m ) 20 15 0 1 2 3 4 memory L CB 41.8 % 27.4 % 21.8 % 61.8 % 50.0 % 41.6 % 32.3% 24.4% 23.9% 32.9 % 20.0 % 15.2 % 53.4 % 42.2 % 28.9 % 23.9% 21.7% 19.4% % : percentage of standard (full-rank) sparse solver 11/24 Se minaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 12/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Industrial expectations: a software platform Technological transfer From research prototyping during PhD thesis to robust and portable software. Examples: Memory Aware : PhDs E. Agullo (LIP-ENS, 2008) and F.-H. Rouet (INPT-IRIT, 2012); Block Low Rank: PhD C. Weibecker (INPT-IRIT with EDF support, 2013). Software issues and interaction with users Code development: develop and combine complex features Software engineering: analysis/experimentation/validation tools, maintenance (also essential for research developments!) Users: expect support, training and adaptation/developments but also: research collaborations, software validation and financial support.
MUMPS solver software platform General context Initially funded by European project (1996-1999), 12 partners from 5 countries Publically available since 1999 at http://graal.ens-lyon.fr/mumps and http://mumps.enseeiht.fr Co-developed in Toulouse, Lyon-Grenoble, Bordeaux by CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux Latest release MUMPS 4.10.0, May 2011, 250 000 lines of C and Fortran code Competitive and original software package used worldwide Integrated within commercial and open-source packages (e.g., Samcef from Samtech, Actran from Free Field Technologies, Code Aster from EDF, PAM-Crash from ESI, IPOPT, Petsc, Trilinos, Debian packages,... ).
Software requests World Map since Dec. 2002 (8839 requests) 15/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Software requests The number of requests per day has increased steadily throughout the evolution of the software Requests per day 4.5 4 3.5 3.52 4.02 3 2.84 2.5 2 1.51.3 1.31 1 1.58 2.04 0.5 0 4.3 4.5 4.6 4.7 4.8 4.9 4.10 MUMPS releases The latest version (4.10.0) is downloaded more than 1000 times per year 16/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
MUMPS Team (May 2014) Permanent members: Patrick Amestoy (INPT-IRIT, Toulouse) Jean-Yves L Excellent (INRIA-LIP, Lyon) Abdou Guermouche (LABRI, Bordeaux) Bora Uçar (CNRS-LIP, Lyon) Alfredo Buttari (CNRS-IRIT, Toulouse) Engineers: Guillaume Joslin (Université Paul Sabatier, Toulouse) Chiara Puglisi (INRIA, Grenoble) Part time on MUMPS: Maurice Brémond (INRIA, Grenoble) PhD Students: Mohamed Sid-Lakhdar (ENS-Lyon) Florent Lopez (UPS, Toulouse) 17/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
2000-2013: Research through PhD s Ph.D. students connected to the project: S.Pralet, CERFACS A. Guermouche,ENS Lyon C. Voemel, CERFACS M. Slavova, CERFACS E. Agullo, ENS Lyon F. Lopez, UPS W. Sid-Lakhdar, ENS Lyon C. Weisbecker, INPT-EDF F.-H. Rouet, INPT 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Some research themes: Preprocessing and orderings, Numerical pivoting and accuracy, Numerical features, Memory usage and task scheduling, Shared-memory parallelism 18/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Relations with our users Exchanges with users Direct contacts by email MUMPS Users Mailing list MUMPS Users Days 1 October 24th, 2006, Lyon, France 2 April 15th - 16th, 2010, Toulouse, France 3 May 29th - 30th, 2013, EDF, Clamart, France Objectives of these workshops: Present some facets of the algorithmic, numerical and software work in the context of the MUMPS project/solver Share experience Identify users expectations (software evolution, new features) Discuss future research tracks and future of MUMPS
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 20/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Research perspectives Scientific hurdles and related research areas Computation driven by memory: Memory-aware algorithms Controlled accuracy to improve complexity: BLR Solver Multicore and asynchronous communications: key issue for time and memory scalability, algorithms and communication schemes need be revisited. Performance projection and target (3D Helmholtz; n = 10 9 ; 1.4 PFlops computer, 2000 nodes, 32 core/node) (Still much research and software work needed to reach this target!!) MUMPS 4.10.0 Research target Time 10 7 seconds 10 4 seconds Factors 8 GB/core 3 GB/core Workspace 50 GB/core 2 GB/core 21/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Software agreement Software agreement signed by owners of the software: CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux 1. Key features All institutions have recognized and confirmed their will to freely distribute MUMPS releases A technical committee supervises technical/scientific decisions Conditions of use for development version defined Conditions of transfer toward next public version defined License for public versions: Cecill-C (LGPL-compatible) 22/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014
Sustainability of MUMPS software and research platform Objectives Stabilize engineering work and expertise with long-term positions Ensure software quality and faster transfer research work MUMPS Consortium Type: group of users Objective: support engineer work Services: beta-release of future/new functionalities, annual meeting to share experience, wish list to influence priority in development, training cycles... On going work... takes more time than one could have expected
References I 24/24 Séminaire Aristote - HPC-Desk ONERA, France, May 20th, 2014