HSL and its outofcore solver


 Caitlin Merritt
 2 years ago
 Views:
Transcription
1 HSL and its outofcore solver Jennifer A. Scott Prague November 2006 p. 1/37
2 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many entries are zero it is worthwhile to exploit these zeros. s p a r s e Prague November 2006 p. 2/37
3 Sparse matrices Many application areas in science, engineering, and finance lead to sparse systems computational fluid dynamics chemical engineering circuit simulation economic modelling fluid flow oceanography linear programming structural engineering... But all have different patterns and characteristics. Prague November 2006 p. 3/37
4 Circuit simulation circuit nz = Prague November 2006 p. 4/37
5 Reservoir modelling nz = 3474 Prague November 2006 p. 5/37
6 Economic modelling nz = 7682 Prague November 2006 p. 6/37
7 Structural engineering nz = Prague November 2006 p. 7/37
8 Acoustics nz = Prague November 2006 p. 8/37
9 Chemical engineering nz = Prague November 2006 p. 9/37
10 Linear programming nz = 4841 Prague November 2006 p. 10/37
11 Direct methods Direct methods involve explicit factorization eg PAQ = LU L, U lower and upper triangular matrices P, Q are permutation matrices Solution process completed by triangular solves Ly = Pb and Uz = y then x = Qz If A is sparse, it is crucial to try to ensure L and U are sparse. Prague November 2006 p. 11/37
12 Direct methods Direct methods involve explicit factorization eg PAQ = LU L, U lower and upper triangular matrices P, Q are permutation matrices Solution process completed by triangular solves Ly = Pb and Uz = y then x = Qz If A is sparse, it is crucial to try to ensure L and U are sparse. Suppose A is n n with nz nonzeros. Gaussian elimination for dense problem requires O(n 2 ) storage and O(n 3 ) flops. Hence infeasible for large n. Target complexity for sparse matrix computations is O(n) + O(nz). Prague November 2006 p. 11/37
13 Direct solvers Most sparse direct solvers have a number of phases, typically ORDER: preorder the matrix to exploit structure ANALYSE: analyse matrix structure to produce data structures for factorization FACTORIZE: perform numerical factorization SOLVE: use factors to solve one or more systems Writing an efficient direct solver is nontrivial so let someone else do it! Prague November 2006 p. 12/37
14 Mathematical software libraries Benefits and advantages of using high quality mathematical software libraries include: Shorten application development cycle, cutting timetomarket and gaining competitive advantage Reduce overall development costs More time to focus on specialist aspects of applications Improve application accuracy and robustness Fully supported and maintained software Prague November 2006 p. 13/37
15 HSL HSL began as Harwell Subroutine Library in Collection of portable, fully documented and tested Fortran packages. Primarily written and developed by Numerical Analysis Group at RAL. Each package performs a basic numerical task (eg solve linear system, find eigenvalues) and has been designed to be incorporated into programs. Particular strengths in: sparse matrix computations optimization largescale system solution Prague November 2006 p. 14/37
16 HSL HSL began as Harwell Subroutine Library in Collection of portable, fully documented and tested Fortran packages. Primarily written and developed by Numerical Analysis Group at RAL. Each package performs a basic numerical task (eg solve linear system, find eigenvalues) and has been designed to be incorporated into programs. Particular strengths in: sparse matrix computations optimization largescale system solution HSL has international reputation for reliability and efficiency. It is used by academics and commercial organisations and has been incorporated into large number of commercial products. Prague November 2006 p. 14/37
17 Development of HSL HSL is both revolutionary and evolutionary. Revolutionary: some codes are radically different in technique and algorithm design, including MA18: First sparse direct code (1971) MA27: First multifrontal code (1982) Prague November 2006 p. 15/37
18 Development of HSL HSL is both revolutionary and evolutionary. Revolutionary: some codes are radically different in technique and algorithm design, including MA18: First sparse direct code (1971) MA27: First multifrontal code (1982) Evolutionary: some codes evolve (major algorithm developments, language changes, added functionality... ) eg MA18 MA28 MA48 (unsymmetric sparse systems) MA17 MA27 MA57 (symmetric sparse systems) Prague November 2006 p. 15/37
19 Organisation of HSL Since 2000, HSL divided into the main HSL library and the HSL Archive HSL Archive consists of older packages that have been superseded either by improved HSL packages (eg MA28 superseded by MA48 and MA27 by MA57) or by public domain libraries such as LAPACK HSL Archive is free to all for noncommercial use but its use is not supported Prague November 2006 p. 16/37
20 Organisation of HSL Since 2000, HSL divided into the main HSL library and the HSL Archive HSL Archive consists of older packages that have been superseded either by improved HSL packages (eg MA28 superseded by MA48 and MA27 by MA57) or by public domain libraries such as LAPACK HSL Archive is free to all for noncommercial use but its use is not supported New release of HSL every 23 years... currently HSL 2004 HSL is marketed by HyproTech UK (part of AspenTech) Prague November 2006 p. 16/37
21 The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Prague November 2006 p. 17/37
22 The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Possible options: Iterative method... but preconditioner? Combine iterative and direct methods? Buy a bigger machine... but expensive and inflexible Use an outofcore solver Prague November 2006 p. 17/37
23 The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Possible options: Iterative method... but preconditioner? Combine iterative and direct methods? Buy a bigger machine... but expensive and inflexible Use an outofcore solver An outofcore solver holds the matrix factors in files and may also hold the matrix data and some work arrays in files. Prague November 2006 p. 17/37
24 Outofcore solvers Idea of outofcore solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors outofcore. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky outofcore multifrontal code TREESOLV for element applications. Prague November 2006 p. 18/37
25 Outofcore solvers Idea of outofcore solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors outofcore. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky outofcore multifrontal code TREESOLV for element applications. More recent codes include: BCSEXTLIB (Boeing) Oblio (Dobrian and Pothen) TAUCS (Toledo and students) Prague November 2006 p. 18/37
26 Outofcore solvers Idea of outofcore solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors outofcore. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky outofcore multifrontal code TREESOLV for element applications. More recent codes include: BCSEXTLIB (Boeing) Oblio (Dobrian and Pothen) TAUCS (Toledo and students) Our new outofcore solver is HSL MA77 Prague November 2006 p. 18/37
27 Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Prague November 2006 p. 19/37
28 Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices A = m k=1 A (k) where A (k) has nonzeros in a small number of rows and columns and corresponds to the matrix from element k. Prague November 2006 p. 19/37
29 Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Reverse communication interface with input by rows or by elements Prague November 2006 p. 19/37
30 Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Reverse communication interface with input by rows or by elements HSL MA77 implements a multifrontal algorithm Prague November 2006 p. 19/37
31 Basic multifrontal algorithm Assume that A is a sum of element matrices. Basic multifrontal algorithm may be described as follows: Given a pivot sequence: do for each pivot assemble all elements that contain the pivot into a dense matrix; eliminate the pivot and any other variables that are found only here; treat the reduced matrix as a new generated element end do Prague November 2006 p. 20/37
32 Multifrontal method ASSEMBLY TREE Each leaf node represents an original element. Each nonleaf node represents set of eliminations and the corresponding generated element Prague November 2006 p. 21/37
33 Multifrontal method At each nonleaf node F F F T 12 F 22 Pivot can only be chosen from F 11 block since F 22 is NOT fully summed. F 22 F 22 F T 12F 1 11 F 12 Prague November 2006 p. 22/37
34 Summary multifrontal method Pass element from children to parent Prague November 2006 p. 23/37
35 Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Prague November 2006 p. 23/37
36 Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Then perform ELIMINATIONS using dense Gaussian elimination (allows Level 3 BLAS TRSM and GEMM) Prague November 2006 p. 23/37
37 Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Then perform ELIMINATIONS using dense Gaussian elimination (allows Level 3 BLAS TRSM and GEMM) Prague November 2006 p. 23/37
38 Language HSL is a Fortran library HSL MA77 written in Fortran 95, PLUS we use allocatable structure components and dummy arguments (part of Fortran 2003, implemented by current compilers). Advantages of using allocatables: more efficient than using pointers pointers must allow for the array being associated with an array section (eg a(i,:)) that is not a contiguous part of its parent optimization of a loop involving a pointer may be inhibited by the possibility that its target is also accessed in another way in the loop avoids the memoryleakage dangers of pointers Prague November 2006 p. 24/37
39 Language (continued) Other features of F95 that are important in design of HSL MA77: Automatic and allocatable arrays significantly reduce complexity of code and user interface, (especially in indefinite case) We selectively use long (64bit) integers (selected int kind(18)) Multifrontal algorithm can be naturally formulated using recursive procedures... call factor (root)... recursive subroutine factor (node)! Loop over children over node do i = 1,number_children call factor (child(i)) end do! Assemble frontal matrix and partially factorize... end subroutine factor Prague November 2006 p. 25/37
40 Virtual memory management Essential to our code design is our virtual memory management system This was part of the original TREESOLV package Separate package HSL OF01 handles all i/o Prague November 2006 p. 26/37
41 Virtual memory management Essential to our code design is our virtual memory management system This was part of the original TREESOLV package Separate package HSL OF01 handles all i/o Provides read/write facilities for one or more direct access files through a single incore buffer (work array) Aim is to avoiding actual inputoutput operations whenever possible Each set of data is accessed as a virtual array i.e. as if it were a very long array Any contiguous section of the virtual array may be read or written Each virtual array is associated with a primary file If too large for a single file, one or more secondary files are used Prague November 2006 p. 26/37
42 Virtual memory management Buffer Virtual arrays Superfiles main_file main_file1 main_file2 temp_file In this example, two superfiles associated with the buffer First superfile has two secondaries, the second has none Prague November 2006 p. 27/37
43 Use of the buffer Buffer divided into fixed length pages Most recently accessed pages of the virtual array held in buffer For each page in buffer, we store: unit number of its primary file page number within corresponding virtual array Required page(s) found using simple hash function Prague November 2006 p. 28/37
44 Use of the buffer Buffer divided into fixed length pages Most recently accessed pages of the virtual array held in buffer For each page in buffer, we store: unit number of its primary file page number within corresponding virtual array Required page(s) found using simple hash function Aim to minimise number of i/o operations by: using wanted pages that are already in buffer first if buffer full, free the least recently accessed page only write page to file if it has changed since entry into buffer Prague November 2006 p. 28/37
45 Advantages Advantages of this approach for developing sparse solvers: All i/o is isolated... assists with code design, development, debugging, and maintenance User is shielded from i/o but can control where files are written and can save data for future solves i/o is not needed if user has supplied long buffer HSL OF01 can be used in development of other solvers Prague November 2006 p. 29/37
46 Use of HSL OF01 within HSL MA77 HSL MA77 has an integer buffer and a real buffer The integer buffer is associated with a file that holds the integer data for the input matrix and the matrix factor The real buffer is associated with two files: one holds the real data for the input matrix and the matrix factor the other is used for the multifrontal stack The indefinite case will use two further files (to hold the integer and real data associated with delayed pivots) The user must supply pathnames and filenames for all the files Prague November 2006 p. 30/37
47 Use of HSL OF01 within HSL MA77 HSL MA77 has an integer buffer and a real buffer The integer buffer is associated with a file that holds the integer data for the input matrix and the matrix factor The real buffer is associated with two files: one holds the real data for the input matrix and the matrix factor the other is used for the multifrontal stack The indefinite case will use two further files (to hold the integer and real data associated with delayed pivots) The user must supply pathnames and filenames for all the files NOTE: We include an option for the files to be replaced by incore arrays (faster for problems for which user has enough memory) Prague November 2006 p. 30/37
48 Numerical experiments Test set of 26 problems of order up to 10 6 from a range of applications All available in University of Florida Sparse Matrix Collection Tests used double precision (64bit) reals on a single 3.6 GHz Intel Xeon processor of a Dell Precision 670 with 4 Gbytes of RAM g95 compiler with the O option and ATLAS BLAS and LAPACK Comparisons with flagship HSL solver MA57 (Duff) All times are wall clock times in seconds Prague November 2006 p. 31/37
49 Effect of varying npage and lpage npage lpage af shell3 crankseg 2 m t1 shipsec incore Prague November 2006 p. 32/37
50 Times for the different phases of HSL_MA77 Phase af shell3 cfd2 fullb thread (n = 504, 855) (n = 123, 440) (n = 199, 187) (n = 29, 736) Input Ordering MA77 analyse MA77 factor(0) MA77 factor(1) MA77 solve(1) MA77 solve(10) MA77 solve(100) AFS AF S Prague November 2006 p. 33/37
51 Factorization time compared with MA57 2 MA57 MA77 in core Time / (MA77 out of core time) Problem Index Prague November 2006 p. 34/37
52 Solve time compared with MA MA57 MA77 in core Time / (MA77 out of core time) Problem Index Prague November 2006 p. 35/37
53 Complete solution time compared with MA57 2 MA57 MA77 in core Time / (MA77 out of core time) Problem Index Prague November 2006 p. 36/37
54 Concluding remarks Writing the solver has been (and still is) a major project Positive definite code performing well Outofcore working adds an overhead but not prohibitive Indefinite kernel currently under development (need for pivoting adds to complexity) Version for complex arithmetic will be developed Also plan version for unsymmetric problems that have (almost) symmetric structure Prague November 2006 p. 37/37
55 Concluding remarks Writing the solver has been (and still is) a major project Positive definite code performing well Outofcore working adds an overhead but not prohibitive Indefinite kernel currently under development (need for pivoting adds to complexity) Version for complex arithmetic will be developed Also plan version for unsymmetric problems that have (almost) symmetric structure References: An outofcore sparse Cholesky solver, J. K. Reid and J. A. Scott, RALTR HSL_OF01, a virtual memory system in Fortran, J. K. Reid and J. A. Scott, RALTR Prague November 2006 p. 37/37
A DAGbased sparse Cholesky solver. architectures. Jonathan Hogg. Sparse Days at CERFACS June John Reid
for multicore architectures Jonathan Hogg j.hogg@ed.ac.uk John Reid john.reid@stfc.ac.uk Jennifer Scott jennifer.scott@stfc.ac.uk Sparse Days at CERFACS June 2009 Outline of talk How to efficiently solve
More information7. LU factorization. factorsolve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix
7. LU factorization EE103 (Fall 201112) factorsolve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse
More informationA note on fast approximate minimum degree orderings for symmetric matrices with some dense rows
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. (2009) Published online in Wiley InterScience (www.interscience.wiley.com)..647 A note on fast approximate minimum degree orderings
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 201112) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationAN OUTOFCORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD
AN OUTOFCORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD OMER MESHAR AND SIVAN TOLEDO Abstract. We present a new outofcore sparse symmetricindefinite factorization algorithm. The most significant
More informationDirect methods for sparse matrices
Direct methods for sparse matrices Iain S. Duff iain.duff@stfc.ac.uk STFC Rutherford Appleton Laboratory Oxfordshire, UK. and CERFACS, Toulouse, France CEAEDFINDRIA Schools. Sophia Antipolis. March 30
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationLinear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:
Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrixvector notation as 4 {{ A x x x {{ x = 6 {{ b General
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More informationAutotuning dense linear algebra libraries on GPUs and overview of the MAGMA library
Autotuning dense linear algebra libraries on GPUs and overview of the MAGMA library Rajib Nath, Stan Tomov, Jack Dongarra Innovative Computing Laboratory University of Tennessee, Knoxville Speaker: Emmanuel
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal  April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal  April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationIt s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU
It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU A. Windisch PhD Seminar: High Performance Computing II G. Haase March 29 th, 2012, Graz Outline 1 MUMPS 2 PaStiX 3 SuperLU 4 Summary
More informationNumerical Linear Algebra Software
Numerical Linear Algebra Software (based on slides written by Michael Grant) BLAS, ATLAS LAPACK sparse matrices Prof. S. Boyd, EE364b, Stanford University Numerical linear algebra in optimization most
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: SVD revisited; Software for Linear Algebra Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 9 Outline 1 Computing
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multicore Platform with Several GPUs Pablo Ezzatti 1, Enrique S. QuintanaOrtí 2 and Alfredo Remón 2 1 Centro de CálculoInstituto de Computación, Univ. de la República
More informationOptimization Techniques in C. Team Emertxe
Optimization Techniques in C Team Emertxe Optimization Techniques Basic Concepts Programming Algorithm and Techniques Optimization Techniques Basic Concepts What is Optimization Methods Space and Time
More informationNotes on Cholesky Factorization
Notes on Cholesky Factorization Robert A. van de Geijn Department of Computer Science Institute for Computational Engineering and Sciences The University of Texas at Austin Austin, TX 78712 rvdg@cs.utexas.edu
More informationExperiments in Unstructured Mesh Finite Element CFD Using CUDA
Experiments in Unstructured Mesh Finite Element CFD Using CUDA Graham Markall Software Performance Imperial College London http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Joint work with David Ham and
More informationDirect Solvers for Sparse Matrices X. Li July 2013
Direct Solvers for Sparse Matrices X. Li July 2013 Direct solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for efficient
More informationDiagonal, Symmetric and Triangular Matrices
Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by
More information6 Gaussian Elimination
G1BINM Introduction to Numerical Methods 6 1 6 Gaussian Elimination 61 Simultaneous linear equations Consider the system of linear equations a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 +
More informationMAGMA: Matrix Algebra on GPU and Multicore Architectures
MAGMA: Matrix Algebra on GPU and Multicore Architectures Presented by Scott Wells Assistant Director Innovative Computing Laboratory (ICL) College of Engineering University of Tennessee, Knoxville Overview
More informationPoisson Equation Solver Parallelisation for ParticleinCell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978873782764 MATFYZPRESS Poisson Equation Solver Parallelisation for ParticleinCell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
More informationModification of the MinimumDegree Algorithm by Multiple Elimination
Modification of the MinimumDegree Algorithm by Multiple Elimination JOSEPH W. H. LIU York University The most widely used ordering scheme to reduce fills and operations in sparse matrix computation is
More informationMatrix Inverse and Determinants
DM554 Linear and Integer Programming Lecture 5 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1 2 3 4 and Cramer s rule 2 Outline 1 2 3 4 and
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More informationChapter 12 File Management. Roadmap
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access
More informationSolving Systems of Linear Equations. Substitution
Solving Systems of Linear Equations There are two basic methods we will use to solve systems of linear equations: Substitution Elimination We will describe each for a system of two equations in two unknowns,
More informationACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
More informationNumerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and NonSquare Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and NonSquare Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420001,
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationOperating Systems for Embedded Computers
University of Zagreb Faculty of Electrical Engineering and Computing Department of Electronics, Microelectronics, Computer and Intelligent Systems Operating Systems for Embedded Computers Summary of textbook:
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization ProtoApplications
More informationPractical Numerical Training UKNum
Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants
More informationPerformance Basics; Computer Architectures
8 Performance Basics; Computer Architectures 8.1 Speed and limiting factors of computations Basic floatingpoint operations, such as addition and multiplication, are carried out directly on the central
More informationLINEAR SYSTEMS. Consider the following example of a linear system:
LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x +2x 2 +3x 3 = 5 x + x 3 = 3 3x + x 2 +3x 3 = 3 x =, x 2 =0, x 3 = 2 In general we want to solve n equations in
More informationSolving Sets of Equations. 150 B.C.E., 九章算術 Carl Friedrich Gauss,
Solving Sets of Equations 5 B.C.E., 九章算術 Carl Friedrich Gauss, 777855 GaussianJordan Elimination In GaussJordan elimination, matrix is reduced to diagonal rather than triangular form Row combinations
More informationLast Class: Memory Management. Recap: Paging
Last Class: Memory Management Static & Dynamic Relocation Fragmentation Paging Lecture 12, page 1 Recap: Paging Processes typically do not use their entire space in memory all the time. Paging 1. divides
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationMathematical Libraries and Application Software on JUROPA and JUQUEEN
Mitglied der HelmholtzGemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel
More informationJones and Bartlett Publishers, LLC. NOT FOR SALE OR DISTRIBUTION
8498_CH08_WilliamsA.qxd 11/13/09 10:35 AM Page 347 Jones and Bartlett Publishers, LLC. NOT FOR SALE OR DISTRIBUTION C H A P T E R Numerical Methods 8 I n this chapter we look at numerical techniques for
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More information8 Square matrices continued: Determinants
8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You
More informationAdaptive Stable Additive Methods for Linear Algebraic Calculations
Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István
More informationTHE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationDETERMINANTS. b 2. x 2
DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationLinear Systems COS 323
Linear Systems COS 323 Last time: Constrained Optimization Linear constrained optimization Linear programming (LP) Simplex method for LP General optimization With equality constraints: Lagrange multipliers
More informationNUMERICAL METHODS C. Carl Gustav Jacob Jacobi 10.1 GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING
0. Gaussian Elimination with Partial Pivoting 0.2 Iterative Methods for Solving Linear Systems 0.3 Power Method for Approximating Eigenvalues 0.4 Applications of Numerical Methods Carl Gustav Jacob Jacobi
More informationSpeeding up MATLAB Applications
Speeding up MATLAB Applications Mannheim, 19. Februar 2014 Michael Glaßer Dipl.Ing. Application Engineer 2014 The MathWorks, Inc. 1 Ihr MathWorks Team heute: Andreas Himmeldorf Senior Team Leader Educational
More informationEmbedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 21 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
More informationPARDISO. User Guide Version 5.0.0
P a r a l l e l S p a r s e D i r e c t A n d M u l t i  R e c u r s i v e I t e r a t i v e L i n e a r S o l v e r s PARDISO User Guide Version 5.0.0 (Updated February 07, 2014) O l a f S c h e n k
More information64Bit versus 32Bit CPUs in Scientific Computing
64Bit versus 32Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie RuhrUniversität Bochum March 2004 1/25 Outline 64Bit and 32Bit CPU Examples
More informationLinear Dependence Tests
Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationSolving linear systems. Solving linear systems p. 1
Solving linear systems Solving linear systems p. 1 Overview Chapter 12 from Michael J. Quinn, Parallel Programming in C with MPI and OpenMP We want to find vector x = (x 0,x 1,...,x n 1 ) as solution of
More informationFortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~
Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~ 31/Oct/2006 Software &Solutions group * Agenda Features of Intel Fortran Compiler Integrate with Visual
More informationGMP implementation on CUDA  A Backward Compatible Design With Performance Tuning
1 GMP implementation on CUDA  A Backward Compatible Design With Performance Tuning Hao Jun Liu, Chu Tong Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto haojun.liu@utoronto.ca,
More informationAbstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).
MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix
More informationChapter 7: Additional Topics
Chapter 7: Additional Topics In this chapter we ll briefly cover selected advanced topics in fortran programming. All the topics come in handy to add extra functionality to programs, but the feature you
More informationPARALLEL ALGORITHMS FOR PREDICTIVE MODELLING
PARALLEL ALGORITHMS FOR PREDICTIVE MODELLING MARKUS HEGLAND Abstract. Parallel computing enables the analysis of very large data sets using large collections of flexible models with many variables. The
More informationGood FORTRAN Programs
Good FORTRAN Programs Nick West Postgraduate Computing Lectures Good Fortran 1 What is a Good FORTRAN Program? It Works May be ~ impossible to prove e.g. Operating system. Robust Can handle bad data e.g.
More informationANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002
ANSYS Solvers: Usage and Performance Ansys equation solvers: usage and guidelines Gene Poole Ansys Solvers Team, April, 2002 Outline Basic solver descriptions Direct and iterative methods Why so many choices?
More informationNumerical Solution of Linear Systems
Numerical Solution of Linear Systems Chen Greif Department of Computer Science The University of British Columbia Vancouver B.C. Tel Aviv University December 17, 2008 Outline 1 Direct Solution Methods
More informationMathematical Libraries on JUQUEEN. JSC Training Course
Mitglied der HelmholtzGemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:
More informationMemoization/Dynamic Programming. The String reconstruction problem. CS125 Lecture 5 Fall 2016
CS125 Lecture 5 Fall 2016 Memoization/Dynamic Programming Today s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember
More informationChapter 4 Index Structures
Chapter 4 Index Structures Having seen the options available for representing records, we must now consider how whole relations, or the extents of classes, are represented. It is not sufficient 4.1. INDEXES
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationAdding large data support to R
Adding large data support to R Luke Tierney Department of Statistics & Actuarial Science University of Iowa January 4, 2013 Luke Tierney (U. of Iowa) Large data in R January 4, 2013 1 / 15 Introduction
More informationScalable Distributed Schur Complement Solvers for Internal and External Flow Computations on ManyCore Architectures
Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on ManyCore Architectures Dr.Ing. Achim Basermann, Dr. HansPeter Kersken, Melven Zöllner** German Aerospace
More informationA Randomized LUbased Solver Using GPU and Intel Xeon Phi Accelerators
A Randomized LUbased Solver Using GPU and Intel Xeon Phi Accelerators Marc Baboulin, Amal Khabou, and Adrien Rémy Université ParisSud, Orsay, France baboulin@lri.fr amal.khabou@lri.fr aremy@lri.fr Abstract.
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 4. Gaussian Elimination In this part, our focus will be on the most basic method for solving linear algebraic systems, known as Gaussian Elimination in honor
More informationArithmetic and Algebra of Matrices
Arithmetic and Algebra of Matrices Math 572: Algebra for Middle School Teachers The University of Montana 1 The Real Numbers 2 Classroom Connection: Systems of Linear Equations 3 Rational Numbers 4 Irrational
More information1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++
Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the yaxis We observe that
More information4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY
PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY 1 Oh the things you should learn How to recognize and write arithmetic sequences
More informationCPUspecific optimization. Example of a target CPU core: ARM CortexM4F core inside LM4F120H5QR microcontroller in Stellaris LM4F120 Launchpad.
CPUspecific optimization 1 Example of a target CPU core: ARM CortexM4F core inside LM4F120H5QR microcontroller in Stellaris LM4F120 Launchpad. Example of a function that we want to optimize: adding 1000
More informationNAG Fortran Library Routine Document E02GAF.1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementationdependent
More information(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.
Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product
More informationHY345 Operating Systems
HY345 Operating Systems Recitation 2  Memory Management Solutions Panagiotis Papadopoulos panpap@csd.uoc.gr Problem 7 Consider the following C program: int X[N]; int step = M; //M is some predefined constant
More informationSimple Fortran Multitasking Library for the Apple Macintosh Computer
Simple Fortran Multitasking Library for the Apple Macintosh Computer Viktor K. Decyk Department of Physics and Astronomy UCLA Los Angeles, California 900951547 decyk@physics.ucla.edu The Apple Macintosh
More informationAlgorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers
The Boeing Company P.O.Box3707,MC7L21 Seattle, WA 981242207 Final Technical Report February 1999 Document D682405 Copyright 1999 The Boeing Company All Rights Reserved Algorithmic Research and Software
More informationAn Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity
More information7.1 Modelling the transportation problem
Chapter Transportation Problems.1 Modelling the transportation problem The transportation problem is concerned with finding the minimum cost of transporting a single commodity from a given number of sources
More informationCS231: Computer Architecture I
CS231: Computer Architecture I Spring 2003 January 22, 2003 20002003 Howard Huang 1 What is computer architecture about? Computer architecture is the study of building entire computer systems. Processor
More informationOUTCOMES BASED LEARNING MATRIX
Course: CTIM371 Programming in C++ OUTCOMES BASED LEARNING MATRIX Department: Computer Technology and Information Management Course Description: This is the first course in the C++ programming language.
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database  appropriate level for users to focus on  user independence from implementation details Performance  other major factor
More informationOperation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floatingpoint
More informationA numerically adaptive implementation of the simplex method
A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1
More informationMathematical Induction
Chapter 2 Mathematical Induction 2.1 First Examples Suppose we want to find a simple formula for the sum of the first n odd numbers: 1 + 3 + 5 +... + (2n 1) = n (2k 1). How might we proceed? The most natural
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationSCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING. Self Study Course
SCHOOL OF MATHEMATICS MATHEMATICS FOR PART I ENGINEERING Self Study Course MODULE 17 MATRICES II Module Topics 1. Inverse of matrix using cofactors 2. Sets of linear equations 3. Solution of sets of linear
More informationDepartment of Electrical and Computer Engineering Faculty of Engineering and Architecture American University of Beirut Course Information
Department of Electrical and Computer Engineering Faculty of Engineering and Architecture American University of Beirut Course Information Course title: Computer Organization Course number: EECE 321 Catalog
More information