It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

Size: px

Start display at page:

Download "It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU"

Gabriella Malone
8 years ago
Views:

1 It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz

2 Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary and Outlook

3 MUMPS MUltifrontal Massively Parallel sparse direct Solver

4 Some historical facts (and others) > 1999

5 Getting MUMPS is easy (it s PD) Link Debian, Ubuntu $ sudo apt-get install libmumps-4.9.2

6 Using MUMPS Interfaces MUMPS (FORTRAN90): 1 C 2 MATLAB 3 Octave 4 Scilab

7 MUMPS: Relevant literature P. R. Amestoy, I. S. Duff, J. Koster and J. Y. L Excellent, SIAM Journal on Matrix Analysis and Applications 23 (2001) P. R. Amestoy, A. Guermouche, J. Y. L Excellent and S. Pralet, Parallel Computing 32 (2006) I. S. Duff and J. K. Reid, ACM Transactions on Mathematical Software 9 (1983) I. S. Duff, A. M. Erisman and J. K. Reid, Oxford University Press, London (1986) J. W. H.Liu, SIAM Review 34 (1992)

L Excellent and S. Pralet, Parallel Computing 32 (2006) 136-156 I. S. Duff and J. K.

8 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T

9 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T

12 A = l A[l] a ij = a ij + a [l] ij

13 A = l A[l] a ij = a ij + a [l] ij Assembly

14 A = l A[l] a ij = a ij + a [l] ij Assembly Fully summed

15 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj GE

16 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj

19 Assemble A, B

20 Assemble A, B Eliminate u u u u 8 l 1 l 5 l

21 Assemble A, B Eliminate 4 Assemble C u u u u 8 l 1 l 5 l 2

22 Assemble A, B Eliminate 4 Assemble C Permute u u u u 1 l 8 l 5 l 2

23 Assemble A, B Eliminate 4 Assemble C Permute 1 8 Eliminate u u u u 1 l u u u u 8 l l 5 l l 2 l

24 u u u u 1 l u u u u 8 l l 5 l l 2 l

45 Frontal (( ((A [1] + A [2] ) + A [3] ) + A [4] ) + ) Multifrontal ((A [1] + A [2] ) + (A [3] + A [4] ) + (A [5] + A [6] ) + (A [7] + A [8] ))

46 How MUMPS solves a problem Analysis 1 Preprocessing Factorization 2 Ordering 3 Symbolic factorization Solution

47 How MUMPS solves a problem Analysis Factorization Solution 1 Elimination tree nodes 2 Numerical factorization: frontal matrices 3 Factor matrices distributed

48 How MUMPS solves a problem Analysis 1 LUx = b, LDL T x = b Factorization 2 Forward: Ly = b or LDy = b 3 Backward: Ux = y or L T x = y Solution

49 MUMPS Furthermore... Interfaces to PORD, SCOTCH, METIS Parallel version requires MPI, BLAS, BLACS and ScaLAPACK Error analysis Detection of null-pivots Schur complement

50 PaStiX Parallel Sparse matrix package

51 Getting PaStiX Link

52 Depending on symmetry: A = LL T A = LDL T PaStiX Steps 1 Reordering to reduce fill-in 2 Symbolic factorization 3 Distribute matrix blocks to processors 4 Decomposition of A 5 Solve system 6 Refine solution (static pivoting)

53 1. Ordering SCOTCH (or METIS) Halo Approximate Minimum Degree Tree represents dependencies 2. Symbolic factorization Structure of factorized matrix from A Cheap step # of off-diag blocks 3. Distribution Partitioning: large blocks distributed to several processors Processor candidates: local communication Distribution: blocks to nodes Use elimination tree Scheduling comm.& comp. Levels of Parallelism Coarse: independend parts of tree Medium: block decomp. Fine: BLAS3

54 4. Factorization Calculate LL T or LDL T multi-frontal vs. super-nodal PaStiX: super-nodal (left-looking) 5. Solve distribution kept cheap 6. Refinement (opt) GMRES by Y. Saad iterative refinement conjugate gradient

55 SuperLU

56 Getting SuperLU Link xiaoye/superlu/ Debian, Ubuntu $ sudo apt-get install libsuperlu3

57 Three packages 1 Sequential SuperLU Sequential processors One or more layers of memory 2 Multithreaded SuperLU (SuperLU_MT) Shared memory multiprocessor (SMPs) Can use parallel processors 3 Distributed SuperLU (SuperLU_DIST) Distributed memory parallel processors MPI Can use hundreds of processors

58 Summary

59 MUMPS Symbolic Factorization Distribution Factorization through multifrontal method PaStiX Symbolic Factorization Distribution Factorization through supernodal method SuperLU To be investigated...

60 Thank You For Your Attention!

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)

Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)