It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU


 Gabriella Malone
 1 years ago
 Views:
Transcription
1 It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz
2 Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary and Outlook
3 MUMPS MUltifrontal Massively Parallel sparse direct Solver
4 Some historical facts (and others) > 1999
5 Getting MUMPS is easy (it s PD) Link Debian, Ubuntu $ sudo aptget install libmumps4.9.2
6 Using MUMPS Interfaces MUMPS (FORTRAN90): 1 C 2 MATLAB 3 Octave 4 Scilab
7 MUMPS: Relevant literature P. R. Amestoy, I. S. Duff, J. Koster and J. Y. L Excellent, SIAM Journal on Matrix Analysis and Applications 23 (2001) P. R. Amestoy, A. Guermouche, J. Y. L Excellent and S. Pralet, Parallel Computing 32 (2006) I. S. Duff and J. K. Reid, ACM Transactions on Mathematical Software 9 (1983) I. S. Duff, A. M. Erisman and J. K. Reid, Oxford University Press, London (1986) J. W. H.Liu, SIAM Review 34 (1992)
8 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T
9 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T
10
11
12 A = l A[l] a ij = a ij + a [l] ij
13 A = l A[l] a ij = a ij + a [l] ij Assembly
14 A = l A[l] a ij = a ij + a [l] ij Assembly Fully summed
15 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj GE
16 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj
17
18
19 Assemble A, B
20 Assemble A, B Eliminate u u u u 8 l 1 l 5 l
21 Assemble A, B Eliminate 4 Assemble C u u u u 8 l 1 l 5 l 2
22 Assemble A, B Eliminate 4 Assemble C Permute u u u u 1 l 8 l 5 l 2
23 Assemble A, B Eliminate 4 Assemble C Permute 1 8 Eliminate u u u u 1 l u u u u 8 l l 5 l l 2 l
24 u u u u 1 l u u u u 8 l l 5 l l 2 l
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45 Frontal (( ((A [1] + A [2] ) + A [3] ) + A [4] ) + ) Multifrontal ((A [1] + A [2] ) + (A [3] + A [4] ) + (A [5] + A [6] ) + (A [7] + A [8] ))
46 How MUMPS solves a problem Analysis 1 Preprocessing Factorization 2 Ordering 3 Symbolic factorization Solution
47 How MUMPS solves a problem Analysis Factorization Solution 1 Elimination tree nodes 2 Numerical factorization: frontal matrices 3 Factor matrices distributed
48 How MUMPS solves a problem Analysis 1 LUx = b, LDL T x = b Factorization 2 Forward: Ly = b or LDy = b 3 Backward: Ux = y or L T x = y Solution
49 MUMPS Furthermore... Interfaces to PORD, SCOTCH, METIS Parallel version requires MPI, BLAS, BLACS and ScaLAPACK Error analysis Detection of nullpivots Schur complement
50 PaStiX Parallel Sparse matrix package
51 Getting PaStiX Link
52 Depending on symmetry: A = LL T A = LDL T PaStiX Steps 1 Reordering to reduce fillin 2 Symbolic factorization 3 Distribute matrix blocks to processors 4 Decomposition of A 5 Solve system 6 Refine solution (static pivoting)
53 1. Ordering SCOTCH (or METIS) Halo Approximate Minimum Degree Tree represents dependencies 2. Symbolic factorization Structure of factorized matrix from A Cheap step # of offdiag blocks 3. Distribution Partitioning: large blocks distributed to several processors Processor candidates: local communication Distribution: blocks to nodes Use elimination tree Scheduling comm.& comp. Levels of Parallelism Coarse: independend parts of tree Medium: block decomp. Fine: BLAS3
54 4. Factorization Calculate LL T or LDL T multifrontal vs. supernodal PaStiX: supernodal (leftlooking) 5. Solve distribution kept cheap 6. Refinement (opt) GMRES by Y. Saad iterative refinement conjugate gradient
55 SuperLU
56 Getting SuperLU Link xiaoye/superlu/ Debian, Ubuntu $ sudo aptget install libsuperlu3
57 Three packages 1 Sequential SuperLU Sequential processors One or more layers of memory 2 Multithreaded SuperLU (SuperLU_MT) Shared memory multiprocessor (SMPs) Can use parallel processors 3 Distributed SuperLU (SuperLU_DIST) Distributed memory parallel processors MPI Can use hundreds of processors
58 Summary
59 MUMPS Symbolic Factorization Distribution Factorization through multifrontal method PaStiX Symbolic Factorization Distribution Factorization through supernodal method SuperLU To be investigated...
60 Thank You For Your Attention!
THESE. Christof VÖMEL CERFACS
THESE pour obtenir LE TITRE DE DOCTEUR DE L INSTITUT NATIONAL POLYTECHNIQUE DE TOULOUSE Spécialité: Informatique et Télécommunications par Christof VÖMEL CERFACS Contributions à la recherche en calcul
More informationA Survey of OutofCore Algorithms in Numerical Linear Algebra
DIMACS Series in Discrete Mathematics and Theoretical Computer Science A Survey of OutofCore Algorithms in Numerical Linear Algebra Sivan Toledo Abstract. This paper surveys algorithms that efficiently
More informationg 2 o: A General Framework for Graph Optimization
g 2 o: A General Framework for Graph Optimization Rainer Kümmerle Giorgio Grisetti Hauke Strasdat Kurt Konolige Wolfram Burgard Abstract Many popular problems in robotics and computer vision including
More informationComputing Personalized PageRank Quickly by Exploiting Graph Structures
Computing Personalized PageRank Quickly by Exploiting Graph Structures Takanori Maehara Takuya Akiba Yoichi Iwata Ken ichi Kawarabayashi National Institute of Informatics, The University of Tokyo JST,
More informationHow To Write Fast Numerical Code: A Small Introduction
How To Write Fast Numerical Code: A Small Introduction Srinivas Chellappa, Franz Franchetti, and Markus Püschel Electrical and Computer Engineering Carnegie Mellon University {schellap, franzf, pueschel}@ece.cmu.edu
More informationStructural and functional analytics for community detection in largescale complex networks
Chopade and Zhan Journal of Big Data DOI 10.1186/s405370150019y RESEARCH Open Access Structural and functional analytics for community detection in largescale complex networks Pravin Chopade 1* and
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More information2 Basic Concepts and Techniques of Cluster Analysis
The Challenges of Clustering High Dimensional Data * Michael Steinbach, Levent Ertöz, and Vipin Kumar Abstract Cluster analysis divides data into groups (clusters) for the purposes of summarization or
More informationIdentifying Small Mean Reverting Portfolios
Identifying Small Mean Reverting Portfolios By Alexandre d Aspremont February 26, 2008 Abstract Given multivariate time series, we study the problem of forming portfolios with maximum mean reversion while
More informationHow Bad is Forming Your Own Opinion?
How Bad is Forming Your Own Opinion? David Bindel Jon Kleinberg Sigal Oren August, 0 Abstract A longstanding line of work in economic theory has studied models by which a group of people in a social network,
More informationWhy is SAS/OR important? For whom is SAS/OR designed?
Fact Sheet What does SAS/OR software do? SAS/OR software provides a powerful array of optimization, simulation and project scheduling techniques to identify the actions that will produce the best results,
More informationOptimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries
Optimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries Jonas Fietz 2, Mathias J. Krause 2, Christian Schulz 1, Peter Sanders 1, and Vincent Heuveline 2 1 Karlsruhe Institute
More informationSupport Vector Machine Solvers
Support Vector Machine Solvers Léon Bottou NEC Labs America, Princeton, NJ 08540, USA ChihJen Lin Department of Computer Science National Taiwan University, Taipei, Taiwan leon@bottou.org cjlin@csie.ntu.edu.tw
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationBuilding Rome in a Day
Building Rome in a Day Sameer Agarwal 1, Noah Snavely 2 Ian Simon 1 Steven. Seitz 1 Richard Szeliski 3 1 University of Washington 2 Cornell University 3 icrosoft Research Abstract We present a system that
More information1. Adaptation of cases for casebased forecasting with neural network support
1. Adaptation of cases for casebased forecasting with neural network support Corchado J. M. Artificial Intelligence Research Group Escuela Superior de Ingeniería Informática, University of Vigo, Campus
More informationOnLine LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking
OnLine LDA: Adaptive Topic Models for Mining Text s with Applications to Topic Detection and Tracking Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi Department of Computer Science George Mason
More informationMultiway Clustering on Relation Graphs
Multiway Clustering on Relation Graphs Arindam Banerjee Sugato Basu Srujana Merugu Abstract A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes
More informationOn Maximum Likelihood Detection and the Search for the Closest Lattice Point
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 49, NO X, 003 1 On Maximum Likelihood Detection and the Search for the Closest Lattice Point Mohamed Oussama Damen, Hesham El Gamal, and Giuseppe Caire Abstract
More informationOrthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More informationAvoiding hotspots on twolevel direct networks
Avoiding hotspots on twolevel direct networks ABSTRACT Abhinav Bhatele Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore, CA 9455, USA bhatele@llnl.gov William
More informationProduct quantization for nearest neighbor search
Product quantization for nearest neighbor search Hervé Jégou, Matthijs Douze, Cordelia Schmid Abstract This paper introduces a product quantization based approach for approximate nearest neighbor search.
More informationScattered Node Compact Finite DifferenceType Formulas Generated from Radial Basis Functions
Scattered Node Compact Finite DifferenceType Formulas Generated from Radial Basis Functions Grady B. Wright a,,, Bengt Fornberg b,2 a Department of Mathematics, University of Utah, Salt Lake City, UT
More informationDense Point Trajectories by GPUaccelerated Large Displacement Optical Flow
Dense Point Trajectories by GPUaccelerated Large Displacement Optical Flow Narayanan Sundaram, Thomas Brox, and Kurt Keutzer University of California at Berkeley {narayans,brox,keutzer}@eecs.berkeley.edu
More informationUniversità degli Studi di Bologna
Università degli Studi di Bologna DEIS Biometric System Laboratory Incremental Learning by Message Passing in Hierarchical Temporal Memory Davide Maltoni Biometric System Laboratory DEIS  University of
More informationTWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS
TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS XIAOLONG LONG, KNUT SOLNA, AND JACK XIN Abstract. We study the problem of constructing sparse and fast mean reverting portfolios.
More informationExperimental Comparison of Set Intersection Algorithms for Inverted Indexing
ITAT 213 Proceedings, CEUR Workshop Proceedings Vol. 13, pp. 58 64 http://ceurws.org/vol13, Series ISSN 161373, c 213 V. Boža Experimental Comparison of Set Intersection Algorithms for Inverted Indexing
More informationNumerical solution of saddle point problems
Acta Numerica (2005), pp. 1 137 c Cambridge University Press, 2005 DOI: 10.1017/S0962492904000212 Printed in the United Kingdom Numerical solution of saddle point problems Michele Benzi Department of Mathematics
More informationNewtonType Methods for Solution of the Electric Network Equations
Tendências em Matemática Aplicada e Computacional, 3, No. 2 (2002), 3542. Uma Publicação da Sociedade Brasileira de Matemática Aplicada e Computacional. NewtonType Methods for Solution of the Electric
More informationModelling with Implicit Surfaces that Interpolate
Modelling with Implicit Surfaces that Interpolate Greg Turk GVU Center, College of Computing Georgia Institute of Technology James F O Brien EECS, Computer Science Division University of California, Berkeley
More information