It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

Size: px
Start display at page:

Download "It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU"

Transcription

1 It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz

2 Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary and Outlook

3 MUMPS MUltifrontal Massively Parallel sparse direct Solver

4 Some historical facts (and others) > 1999

5 Getting MUMPS is easy (it s PD) Link Debian, Ubuntu $ sudo apt-get install libmumps-4.9.2

6 Using MUMPS Interfaces MUMPS (FORTRAN90): 1 C 2 MATLAB 3 Octave 4 Scilab

7 MUMPS: Relevant literature P. R. Amestoy, I. S. Duff, J. Koster and J. Y. L Excellent, SIAM Journal on Matrix Analysis and Applications 23 (2001) P. R. Amestoy, A. Guermouche, J. Y. L Excellent and S. Pralet, Parallel Computing 32 (2006) I. S. Duff and J. K. Reid, ACM Transactions on Mathematical Software 9 (1983) I. S. Duff, A. M. Erisman and J. K. Reid, Oxford University Press, London (1986) J. W. H.Liu, SIAM Review 34 (1992)

8 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T

9 So, what is MUMPS, and how does it work? Solves Ax = b Direct Solver based on Multifrontal Approach A square sparse matrix 1 unsymmetric 2 symmetric positive definite 3 general symmetric Factorization A = LU Symmetric A A = LDL T

10

11

12 A = l A[l] a ij = a ij + a [l] ij

13 A = l A[l] a ij = a ij + a [l] ij Assembly

14 A = l A[l] a ij = a ij + a [l] ij Assembly Fully summed

15 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj GE

16 a (k+1) ij = a (k) ij a (k) ik a(k) kk 1 a (k) kj

17

18

19 Assemble A, B

20 Assemble A, B Eliminate u u u u 8 l 1 l 5 l

21 Assemble A, B Eliminate 4 Assemble C u u u u 8 l 1 l 5 l 2

22 Assemble A, B Eliminate 4 Assemble C Permute u u u u 1 l 8 l 5 l 2

23 Assemble A, B Eliminate 4 Assemble C Permute 1 8 Eliminate u u u u 1 l u u u u 8 l l 5 l l 2 l

24 u u u u 1 l u u u u 8 l l 5 l l 2 l

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45 Frontal (( ((A [1] + A [2] ) + A [3] ) + A [4] ) + ) Multifrontal ((A [1] + A [2] ) + (A [3] + A [4] ) + (A [5] + A [6] ) + (A [7] + A [8] ))

46 How MUMPS solves a problem Analysis 1 Preprocessing Factorization 2 Ordering 3 Symbolic factorization Solution

47 How MUMPS solves a problem Analysis Factorization Solution 1 Elimination tree nodes 2 Numerical factorization: frontal matrices 3 Factor matrices distributed

48 How MUMPS solves a problem Analysis 1 LUx = b, LDL T x = b Factorization 2 Forward: Ly = b or LDy = b 3 Backward: Ux = y or L T x = y Solution

49 MUMPS Furthermore... Interfaces to PORD, SCOTCH, METIS Parallel version requires MPI, BLAS, BLACS and ScaLAPACK Error analysis Detection of null-pivots Schur complement

50 PaStiX Parallel Sparse matrix package

51 Getting PaStiX Link

52 Depending on symmetry: A = LL T A = LDL T PaStiX Steps 1 Reordering to reduce fill-in 2 Symbolic factorization 3 Distribute matrix blocks to processors 4 Decomposition of A 5 Solve system 6 Refine solution (static pivoting)

53 1. Ordering SCOTCH (or METIS) Halo Approximate Minimum Degree Tree represents dependencies 2. Symbolic factorization Structure of factorized matrix from A Cheap step # of off-diag blocks 3. Distribution Partitioning: large blocks distributed to several processors Processor candidates: local communication Distribution: blocks to nodes Use elimination tree Scheduling comm.& comp. Levels of Parallelism Coarse: independend parts of tree Medium: block decomp. Fine: BLAS3

54 4. Factorization Calculate LL T or LDL T multi-frontal vs. super-nodal PaStiX: super-nodal (left-looking) 5. Solve distribution kept cheap 6. Refinement (opt) GMRES by Y. Saad iterative refinement conjugate gradient

55 SuperLU

56 Getting SuperLU Link xiaoye/superlu/ Debian, Ubuntu $ sudo apt-get install libsuperlu3

57 Three packages 1 Sequential SuperLU Sequential processors One or more layers of memory 2 Multithreaded SuperLU (SuperLU_MT) Shared memory multiprocessor (SMPs) Can use parallel processors 3 Distributed SuperLU (SuperLU_DIST) Distributed memory parallel processors MPI Can use hundreds of processors

58 Summary

59 MUMPS Symbolic Factorization Distribution Factorization through multifrontal method PaStiX Symbolic Factorization Distribution Factorization through supernodal method SuperLU To be investigated...

60 Thank You For Your Attention!

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater) Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)

More information

A note on fast approximate minimum degree orderings for symmetric matrices with some dense rows

A note on fast approximate minimum degree orderings for symmetric matrices with some dense rows NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. (2009) Published online in Wiley InterScience (www.interscience.wiley.com)..647 A note on fast approximate minimum degree orderings

More information

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Poisson Equation Solver Parallelisation for Particle-in-Cell Model WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,

More information

AN OUT-OF-CORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD

AN OUT-OF-CORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD AN OUT-OF-CORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD OMER MESHAR AND SIVAN TOLEDO Abstract. We present a new out-of-core sparse symmetric-indefinite factorization algorithm. The most significant

More information

HSL and its out-of-core solver

HSL and its out-of-core solver HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

The MUMPS Solver: academic needs and industrial expectations

The MUMPS Solver: academic needs and industrial expectations The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, Bordeaux 1 CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Séminaire Aristote -

More information

GOAL AND STATUS OF THE TLSE PLATFORM

GOAL AND STATUS OF THE TLSE PLATFORM GOAL AND STATUS OF THE TLSE PLATFORM P. Amestoy, F. Camillo, M. Daydé,, L. Giraud, R. Guivarch, V. Moya Lamiel,, M. Pantel, and C. Puglisi IRIT-ENSEEIHT And J.-Y. L excellentl LIP-ENS Lyon / INRIA http://www.irit.enseeiht.fr

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Mathematical Libraries and Application Software on JUROPA and JUQUEEN Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Mathematical Libraries on JUQUEEN. JSC Training Course

Mathematical Libraries on JUQUEEN. JSC Training Course Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:

More information

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse)

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse) THÈSE En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse) Présentée et soutenue par : Clément Weisbecker Le 28 Octobre

More information

Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures

Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures Mohamed Wissam Sid Lakhdar To cite this version: Mohamed Wissam Sid Lakhdar.

More information

A survey of direct methods for sparse linear systems

A survey of direct methods for sparse linear systems A survey of direct methods for sparse linear systems Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar Technical Report, Department of Computer Science and Engineering, Texas A&M Univ,

More information

A Parallel Lanczos Algorithm for Eigensystem Calculation

A Parallel Lanczos Algorithm for Eigensystem Calculation A Parallel Lanczos Algorithm for Eigensystem Calculation Hans-Peter Kersken / Uwe Küster Eigenvalue problems arise in many fields of physics and engineering science for example in structural engineering

More information

A study of various load information exchange mechanisms for a distributed application using dynamic scheduling

A study of various load information exchange mechanisms for a distributed application using dynamic scheduling Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 A study of various load information exchange mechanisms for a distributed

More information

Algorithmique pour l algèbre linéaire creuse

Algorithmique pour l algèbre linéaire creuse Algorithmique pour l algèbre linéaire creuse Pascal Hénon 12 janvier 2009 Pascal Hénon Algorithmique pour l algèbre linéaire creuse module IS309 1 Contributions Many thanks to Patrick Amestoy, Abdou Guermouche

More information

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse)

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse) THÈSE En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE Délivré par : L Institut National Polytechnique de Toulouse (INP Toulouse) Présentée et soutenue par : François-Henry Rouet Le 7 Octobre

More information

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication). MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix

More information

3 P0 P0 P3 P3 8 P1 P0 P2 P3 P1 P2

3 P0 P0 P3 P3 8 P1 P0 P2 P3 P1 P2 A Comparison of 1-D and 2-D Data Mapping for Sparse LU Factorization with Partial Pivoting Cong Fu y Xiangmin Jiao y Tao Yang y Abstract This paper presents a comparative study of two data mapping schemes

More information

On fast factorization pivoting methods for sparse symmetric indefinite systems

On fast factorization pivoting methods for sparse symmetric indefinite systems On fast factorization pivoting methods for sparse symmetric indefinite systems by Olaf Schenk 1, and Klaus Gärtner 2 Technical Report CS-2004-004 Department of Computer Science, University of Basel Submitted

More information

PARALLEL ALGORITHMS FOR PREDICTIVE MODELLING

PARALLEL ALGORITHMS FOR PREDICTIVE MODELLING PARALLEL ALGORITHMS FOR PREDICTIVE MODELLING MARKUS HEGLAND Abstract. Parallel computing enables the analysis of very large data sets using large collections of flexible models with many variables. The

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix 7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse

More information

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: SVD revisited; Software for Linear Algebra Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 9 Outline 1 Computing

More information

A distributed CPU-GPU sparse direct solver

A distributed CPU-GPU sparse direct solver A distributed CPU-GPU sparse direct solver Piyush Sao 1, Richard Vuduc 1, and Xiaoye Li 2 1 Georgia Institute of Technology, {piyush3,richie}@gatech.edu 2 Lawrence Berkeley National Laboratory, xsli@lbl.gov

More information

c 1999 Society for Industrial and Applied Mathematics

c 1999 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 20, No. 4, pp. 915 952 c 1999 Society for Industrial and Applied Mathematics AN ASYNCHRONOUS PARALLEL SUPERNODAL ALGORITHM FOR SPARSE GAUSSIAN ELIMINATION JAMES W. DEMMEL,

More information

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers The Boeing Company P.O.Box3707,MC7L-21 Seattle, WA 98124-2207 Final Technical Report February 1999 Document D6-82405 Copyright 1999 The Boeing Company All Rights Reserved Algorithmic Research and Software

More information

THESE. Christof VÖMEL CERFACS

THESE. Christof VÖMEL CERFACS THESE pour obtenir LE TITRE DE DOCTEUR DE L INSTITUT NATIONAL POLYTECHNIQUE DE TOULOUSE Spécialité: Informatique et Télécommunications par Christof VÖMEL CERFACS Contributions à la recherche en calcul

More information

Notes on Cholesky Factorization

Notes on Cholesky Factorization Notes on Cholesky Factorization Robert A. van de Geijn Department of Computer Science Institute for Computational Engineering and Sciences The University of Texas at Austin Austin, TX 78712 rvdg@cs.utexas.edu

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

Advanced Computational Software

Advanced Computational Software Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 June 10 2011 Outline Quick review Fancy Linear Algebra libraries - ScaLAPACK -PETSc

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux

Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux Espaces grossiers adaptatifs pour les méthodes de décomposition de domaines à deux niveaux Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines et Univ. Paris VI joint work with Victorita Dolean

More information

Parallel Interior Point Solver for Structured Quadratic Programs: Application to Financial Planning Problems

Parallel Interior Point Solver for Structured Quadratic Programs: Application to Financial Planning Problems Parallel Interior Point Solver for Structured uadratic Programs: Application to Financial Planning Problems Jacek Gondzio Andreas Grothey April 15th, 2003 MS-03-001 For other papers in this series see

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Modification of the Minimum-Degree Algorithm by Multiple Elimination

Modification of the Minimum-Degree Algorithm by Multiple Elimination Modification of the Minimum-Degree Algorithm by Multiple Elimination JOSEPH W. H. LIU York University The most widely used ordering scheme to reduce fills and operations in sparse matrix computation is

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt. An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems

A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems Stefania Bellavia Dipartimento di Ingegneria Industriale Università degli Studi di Firenze Joint work

More information

Best practices for efficient HPC performance with large models

Best practices for efficient HPC performance with large models Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,

More information

Load balancing. David Bindel. 12 Nov 2015

Load balancing. David Bindel. 12 Nov 2015 Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single processor performance Typically in the memory system Saw this in matrix multiply assignment Overhead for parallelism

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation Mark Baker, Garry Smith and Ahmad Hasaan SSE, University of Reading Paravirtualization A full assessment of paravirtualization

More information

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002 ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?

More information

Techniques of the simplex basis LU factorization update

Techniques of the simplex basis LU factorization update Techniques of the simplex basis L factorization update Daniela Renata Cantane Electric Engineering and Computation School (FEEC), State niversity of Campinas (NICAMP), São Paulo, Brazil Aurelio Ribeiro

More information

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications

A Load Balancing Tool for Structured Multi-Block Grid CFD Applications A Load Balancing Tool for Structured Multi-Block Grid CFD Applications K. P. Apponsah and D. W. Zingg University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, M3H 5T6, Canada Email:

More information

Experiences of numerical simulations on a PC cluster Antti Vanne December 11, 2002

Experiences of numerical simulations on a PC cluster Antti Vanne December 11, 2002 xperiences of numerical simulations on a P cluster xperiences of numerical simulations on a P cluster ecember xperiences of numerical simulations on a P cluster Introduction eowulf concept Using commodity

More information

Toward a New Metric for Ranking High Performance Computing Systems

Toward a New Metric for Ranking High Performance Computing Systems SANDIA REPORT SAND2013-4744 Unlimited Release Printed June 2013 Toward a New Metric for Ranking High Performance Computing Systems Jack Dongarra, University of Tennessee Michael A. Heroux, Sandia National

More information

Limited Memory Solution of Complementarity Problems arising in Video Games

Limited Memory Solution of Complementarity Problems arising in Video Games Laboratoire d Arithmétique, Calcul formel et d Optimisation UMR CNRS 69 Limited Memory Solution of Complementarity Problems arising in Video Games Michael C. Ferris Andrew J. Wathen Paul Armand Rapport

More information

Mesh Partitioning and Load Balancing

Mesh Partitioning and Load Balancing and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration

More information

NOTUR Technology Transfer Projects (TTP)

NOTUR Technology Transfer Projects (TTP) NOTUR Technology Transfer Projects (TTP) By Trond Kvamsdal NOTUR 10. Juni 2004, Tromsø, Norway CONTENTS The concept behind the TTPs Results obtained from the TTPs Concluding remarks Purpose Enable optimal

More information

Optimization on Huygens

Optimization on Huygens Optimization on Huygens Wim Rijks wimr@sara.nl Contents Introductory Remarks Support team Optimization strategy Amdahls law Compiler options An example Optimization Introductory Remarks Modern day supercomputers

More information

Solving Very Large Financial Planning Problems on Blue Gene

Solving Very Large Financial Planning Problems on Blue Gene U N I V E R S School of Mathematics T H E O I T Y H F G E D I N U R Solving Very Large Financial Planning Problems on lue Gene ndreas Grothey, University of Edinburgh joint work with Jacek Gondzio, Marco

More information

Parallel Algorithm for Dense Matrix Multiplication

Parallel Algorithm for Dense Matrix Multiplication Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem

More information

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00 Numerical Analysis Professor Donna Calhoun Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00 Fall 2013 Math 465/565 http://math.boisestate.edu/~calhoun/teaching/math565_fall2013 What is

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx. Solving Linear Systems of Equations Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:

More information

How To Solve The Fmfontham Equation In A Two Level Iterative Computer Science Project

How To Solve The Fmfontham Equation In A Two Level Iterative Computer Science Project Computer Science Master's Project Report Rensselaer Polytechnic Institute Troy, NY 12180 Development of A Two-Level Iterative Computational Method for Solution of the Franklin Approximation Algorithm for

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

arxiv:cs/0101001v1 [cs.ms] 3 Jan 2001

arxiv:cs/0101001v1 [cs.ms] 3 Jan 2001 ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 arxiv:cs/0101001v1 [cs.ms] 3 Jan 2001 AUTOMATIC DIFFERENTIATION TOOLS IN OPTIMIZATION SOFTWARE Jorge J. Moré Mathematics and Computer

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

solution flow update flow

solution flow update flow A High Performance wo Dimensional Scalable Parallel Algorithm for Solving Sparse riangular Systems Mahesh V. Joshi y Anshul Gupta z George Karypis y Vipin Kumar y Abstract Solving a system of equations

More information

A Survey of Out-of-Core Algorithms in Numerical Linear Algebra

A Survey of Out-of-Core Algorithms in Numerical Linear Algebra DIMACS Series in Discrete Mathematics and Theoretical Computer Science A Survey of Out-of-Core Algorithms in Numerical Linear Algebra Sivan Toledo Abstract. This paper surveys algorithms that efficiently

More information

A Parallel Quasi-Monte Carlo Method for Solving Systems of Linear Equations

A Parallel Quasi-Monte Carlo Method for Solving Systems of Linear Equations A Parallel Quasi-Monte Carlo Method for Solving Systems of Linear Equations Michael Mascagni 1, and Aneta Karaivanova 1,2 1 Department of Computer Science, Florida State University, 203 Love Building,

More information

Object-oriented scientific computing

Object-oriented scientific computing Object-oriented scientific computing Pras Pathmanathan Summer 2012 The finite element method Advantages of the FE method over the FD method Main advantages of FE over FD 1 Deal with Neumann boundary conditions

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

The University of Florida Sparse Matrix Collection

The University of Florida Sparse Matrix Collection The University of Florida Sparse Matrix Collection TIMOTHY A. DAVIS University of Florida and YIFAN HU AT&T Labs Research We describe the University of Florida Sparse Matrix Collection, a large and actively

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed communication-aware load balancing with TreeMatch in Charm++ Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms. Amani AlOnazi.

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms. Amani AlOnazi. Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms Amani AlOnazi For the Degree of Master of Science Uniersity College Dublin, Dublin, Ireland

More information

Low Level. Software. Solution. extensions to handle. coarse grained task. compilers with. Data parallel. parallelism.

Low Level. Software. Solution. extensions to handle. coarse grained task. compilers with. Data parallel. parallelism. . 1 History 2 æ 1960s - First Organized Collections Problem Solving Environments for Parallel Scientiæc Computation Jack Dongarra Univ. of Tenn.èOak Ridge National Lab dongarra@cs.utk.edu æ 1970s - Advent

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

More information

Distributed block independent set algorithms and parallel multilevel ILU preconditioners

Distributed block independent set algorithms and parallel multilevel ILU preconditioners J Parallel Distrib Comput 65 (2005) 331 346 wwwelseviercom/locate/jpdc Distributed block independent set algorithms and parallel multilevel ILU preconditioners Chi Shen, Jun Zhang, Kai Wang Department

More information

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch

More information

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits

Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis Pelle Jakovits Outline Problem statement State of the art Approach Solutions and contributions Current work Conclusions

More information